After five years work, the project "Research on Automatic Abstraction Based on Statistics and Semantic Analysis for Chinese and English Texts" is accomplished. There are two prominent characteristics of this project, in techniques and in methods. One is that the system imported semantic hierarchy concept on the base of traditional word-frequency statistic. It uses extended Dictionary of Synonymy Words in Chinese, and Word-Net and related theory of hierarchy concepts in English. Thus, a more ideal Vector Space Model (VSM) was built, and it got statistic information more precisely. To analysis and identify multi-topic text, the system analyzed the distribution of many kinds of title words and key words, and made a first successful step in resolving the issue of unbalanced distribution of abstract. The other is that, to make the abstract more readable, many readable processes were applied on the raw abstract. Those mainly include sentence-form analysis in Chinese, linked grammar analysis in English, research on removing redundant repetition of abstracted sentences, research on the arrangement and transform of sentence-form, research on suspend conjunction words problem by use of the match of templates, and etc. .Based on these research works, we accomplished a more general and ideal Chinese and English Texts Abstract System technically.
随着科技的高度发展,人类已生活在信息的汪洋大海之中。如何快捷有效地获取最有用的信息,对当今经济与技术发展至关重要。本项目充分利用课题组大型语料库系统与汉语句型自动分析与分布统计系统研究的成果和经验,以中文为主,采用统计信息与语义分析相结合的综合手段,实现一个质量高覆盖面广的中英文自动文摘系统。它必将具有广泛的应用前景和巨大的社会与经济效益。
{{i.achievement_title}}
数据更新时间:2023-05-31
演化经济地理学视角下的产业结构演替与分叉研究评述
玉米叶向值的全基因组关联分析
正交异性钢桥面板纵肋-面板疲劳开裂的CFRP加固研究
硬件木马:关键问题研究进展及新动向
基于SSVEP 直接脑控机器人方向和速度研究
基于统计机器翻译和自动文摘的查询扩展研究
基于语义分析和统计的自动主题标引研究
基于信息重组的多文档自动文摘技术
基于逻辑框架的多文档自动文摘技术