Under the background of the national strategy of “the Belt and Road”, Chinese-Vietnamese bilingual machine translation plays an important role in promoting bilateral communications and exchanges in all fields such as politics, economy and culture. This project, aiming at the facts that there are dramatic grammatical differences between Chinese and Vietnamese and that the bilingual corpus are very few, will do its research on Chinese-Vietnamese tree-to-tree syntactic statistical machine translation oriented bilingual language differences and the research on Chinese-Vietnamese syntactic statistical machine translation based on pivot language (here using English). Firstly, we plan to analyze the language differences between Chinese and Vietnamese, to fuse their language features into learning and decoding process of the tree-to-tree translation model, and to propose a tree-to-tree syntax machine translation method which will fit the feature of Chinese and Vietnamese well. Secondly, aiming at the lack of Chinese-Vietnamese corpus, we plan to propose a Chinese-Vietnamese phrase machine translation using English as pivot language, in which we extract a large-scale Chinese-Vietnamese phrase translation rules table with probability based on the pivot language. And then, we will analyze the alignment between Chinese-English phrase-structure tree and English-Vietnamese phrase-structure tree, so that we will propose a Chinese-Vietnamese tree-to-tree machine translation based on pivot language (English), in which we can obtain a certain-scale Chinese-Vietnamese phrase-structure tree translation rules by using large-scale corpus of Chinese-English and English-Vietnamese. At last, to take full advantages of all the different methods aforementioned, we plan to explore the fusion method of the Chinese-Vietnamese tree-to-tree translation method, the Chinese-Vietnamese phrase translation method based on pivot language, and the Chinese-Vietnamese tree-to-tree translation method based on pivot language, which can solve the difficult problems in Chinese-Vietnamese machine translation, such as the grammatical difference and the corpus being poor, and has a very important value to Chinese-Vietnamese machine translation in both theoretical and practical aspect.
在国家一带一路战略背景下,汉越双语机器翻译对推动两国在政治、经济、文化等方面交流有非常重要的作用。课题将针对汉语与越南语语法差异大、语料稀缺特点,开展面向汉越语言差异的树到树句法统计翻译及基于枢轴语言(英语)的句法统计翻译方法研究。首先,分析汉越语言差异特性,将语言特点融合到树到树学习与解码过程中,提出适合汉越语言特性的树到树句法翻译方法;其次,针对汉越语料稀缺问题,提出以英语为枢轴语言的汉越短语翻译方法,基于枢轴语言提取大规模概率化汉越短语翻译规则表;然后,分析汉英、英越短语句法树对应关系,提出基于枢轴语言(英语)的汉越树到树翻译方法,利用大规模枢轴语言(英语)获得具有一定规模的汉越短语树句法翻译规则;最后,针对不同翻译方法的优缺点,提出汉越树到树翻译、枢轴短语翻译及枢轴树到树翻译的融合方法,解决汉越双语翻译面临的语言差异及语料稀缺等难点问题,对汉越翻译有着非常重要的理论与实际应用价值。
基于枢轴语言的机器翻译是解决低资源机器翻译的主要手段之一,项目围绕汉-越双语词典构建、汉-越双语平行语料库构建、汉-越句法统计机器翻译、枢轴语言机器翻译等关键难点问题进行研究与探讨,在以下6个方面取得了进展:1.汉-越双语词典构建方面,提出基于英语枢轴的弱监督汉-越双语词典构建方法,利用枢轴语料抽取17万汉-越双语词典。2.在汉-越双语平行语料库构建方面,提出基于枢轴语言的汉-越伪平行语料生成方法及融合句法结构及Tree-LSTM的汉-越平行句对抽取方法,利用枢轴回译及枢轴抽取的方式生成了近400万汉-越双语平行语料。3.在基于短语的汉-越机器翻译方面,提出融合语言位置特征的汉-越机器翻译方法,利用词汇化调序模型对符合语言特性的规则进行权重调优,得到更符合语法规则的译文;提出基于记忆网络融合词汇翻译概率的汉-越机器翻译方法,将统计机器翻译中词汇翻译概率融入神经机器翻译模型,提升了汉越神经机器翻译的性能。4.在基于句法的汉-越机器翻译方法方面,提出融合语言差异特点的汉-越树到树统计机器翻译方法,将语言差异特征融入句法统计机器翻译中,提出融合句法解析树的汉-越神经机器翻译方法,将句法信息融入神经机器翻译模型的编码过程,均有效提升了翻译质量。5.在基于枢轴的汉-越机器翻译方面,提出基于迁移学习的汉-越神经机器翻译方法,实现英-汉、英-越翻译模型到汉-越翻译模型之间的知识迁移,提出基于枢轴的汉-越联合训练神经机器翻译方法,借助英-汉、英-越翻译模型提升汉-越机器翻译模型的性能。6.研发了汉-越机器翻译系统,实现汉语-越南语之间的双向翻译,系统在网信、国安、军方等多个领域得到应用。发表论文21篇,其中SCI收录3篇,EI收录4篇,授权国家发明专利3项,受理国家发明专利17项。承办CCFAI2017,CCL2019等国内该领域的学术会议,参加国际国内学术会议61人次。获得国务院特殊津贴1人,获得省级人才6人次,培养硕士19人、博士1人,获得3篇省级优秀硕士论文。
{{i.achievement_title}}
数据更新时间:2023-05-31
基于细粒度词表示的命名实体识别研究
基于FTA-BN模型的页岩气井口装置失效概率分析
基于关系对齐的汉语虚词抽象语义表示与分析
基于贝叶斯统计模型的金属缺陷电磁成像方法研究
顾及功能语义特征的建筑物空间分布模式识别方法
基于主题模型的枢轴语言统计机器翻译研究
基于深度句法的统计机器翻译方法研究
融入语言学知识的汉蒙统计机器翻译研究
面向可比语料的汉越神经机器翻译方法研究