Discourse cohesion analysis plays a critical role in discourse understanding, in which there exist differences in cohesion between English and Chinese, including anaphor, ellipsis and connective. However, there are few studies in discourse cohesion alignment between Chinese and English, due to the lack of publicly available parallel resource annotated with discourse cohesion on both language sides. Consequently, few studies have been seen in applying discourse cohesion in natural language process tasks, such as machine translation...To this end, this project aims to create a Chinese-English parallel resource with discourse cohesion annotation on both sides and their alignment. This is done in three-level levels. First, we explore proper strategies in annotating discourse cohesion, including anaphor, ellipsis, and connectives. Second, we propose targeted approach to automatically recognize anaphor, ellipses, and connectives in both Chinese and English, and consequently learn the alignment of cohesion between the two languages. Third and finally, we propose several approaches to effectively apply bilingual discourse cohesion in statistical machine translation (SMT) and SMT evaluation...To sum up, The project has important significance for promoting discourse semantic analysis between English and Chinese.
篇章衔接性分析是理解篇章的基础,英语和汉语在指代、省略和连接等主要衔接方式上存在差异。现有汉英平行语料主要进行了句子对齐,缺乏衔接信息的对齐,导致国内外对于汉英篇章衔接对齐分析研究很少,从而影响了融合衔接信息的机器翻译等相关应用。本项目旨在创建汉英篇章衔接对齐资源,研究衔接自动对齐分析技术,并将此应用于融合衔接信息的机器翻译。首先,研究汉英篇章衔接对齐标注策略,建立包含指代、省略和连接对齐信息的汉英篇章衔接对齐资源;其次,基于所建资源,结合汉英衔接特点,采取不同的分析策略和处理方法,实现汉英衔接对齐分析平台;最后,将衔接信息融入机器翻译系统中,从提高机器翻译性能和改善评测两方面考察衔接信息的作用。本项目开展的研究工作对于推进汉英篇章语义分析研究具有重要的意义。
篇章衔接性分析是理解篇章的基础,英语和汉语在指代、省略和连接等主要衔接方式上存在差异。英语和汉语在指代、省略和连接等主要衔接方式上存在差异,现有汉英平行语料主要进行了句子对齐,缺乏衔接信息的对齐,导致国内外对于汉英篇章衔接对齐分析研究很少, 从而影响了融合衔接信息的机器翻译等相关应用。本项目的研究成果包括:. 1)提出汉英篇章衔接对齐语料库标注策略,给出了语料标注方案,开发了标注工具,完成了包含子句、指代、省略和连接对齐信息的200个平行文档的汉英篇章衔接对齐语料库标注,标注质量良好。. 2)基于所建资源,结合汉英衔接方式的特点,采取不同的分析策略和处理方法,进行了子句、连接词和指代的分析研究,结果表明本语料库是可计算的。. 3)将衔接信息融入机器翻译系统,初步实验表明衔接信息可提高机器翻译性能。. 本项目工作对于推进汉英篇章语义分析研究具有重要的意义。
{{i.achievement_title}}
数据更新时间:2023-05-31
玉米叶向值的全基因组关联分析
监管的非对称性、盈余管理模式选择与证监会执法效率?
跨社交网络用户对齐技术综述
正交异性钢桥面板纵肋-面板疲劳开裂的CFRP加固研究
硬件木马:关键问题研究进展及新动向
汉语篇章衔接性分析:指代、省略及其消歧研究
面向篇章信息性的汉语篇章结构多层次联合分析研究
汉语篇章结构分析的资源建设与计算模型研究
微观和宏观主次关系驱动的篇章结构分析研究