The project is based on probability of Chinese character or word N-gram cooccurence on large scale corpora. The contents of research is about key techniques of automatic Chinese word classification, including statistical regularities of Chinese words, word sense similarity, and algorithm of automatic word classification based on large vocabulary. The object of research is to construct a class-based staticacal language model. The research is meaningful theoretically and pratically for natural language processing. In the article, outline of project, its execution, main results, cultivation of person, and using of outlay are treated. Work in future is predicted.
本项目以基于大规模语料库的汉语字、词的不同元数尤其是三元以上的同现概率统计为基础,研究有关汉语词语自动聚类关键技术,包括汉语构词统计规律、基于上下文的词语相似度的计算方法、面向大词表的词语自动聚类算法,进而构造一个基于类的统计语言模型。本项目的实施对人工智能、自然语言处理等领域具有重要的科学意义和应用前景。
{{i.achievement_title}}
数据更新时间:2023-05-31
EBPR工艺运行效果的主要影响因素及研究现状
基于铁路客流分配的旅客列车开行方案调整方法
一种基于多层设计空间缩减策略的近似高维优化方法
二维FM系统的同时故障检测与控制
扶贫资源输入对贫困地区分配公平的影响
基于词语独异性特征的大规模词义标注语料库自动构建研究
基于语料库的汉语短语自动切分方法研究
大规模汉语历时语料库建设及词汇语义变迁研究
基于Web的大规模双语语料库挖掘及翻译知识自动获取