The project is based on probability of Chinese character or word N-gram cooccurence on large scale corpora. The contents of research is about key techniques of automatic Chinese word classification, including statistical regularities of Chinese words, word sense similarity, and algorithm of automatic word classification based on large vocabulary. The object of research is to construct a class-based staticacal language model. The research is meaningful theoretically and pratically for natural language processing. In the article, outline of project, its execution, main results, cultivation of person, and using of outlay are treated. Work in future is predicted.
本项目以基于大规模语料库的汉语字、词的不同元数尤其是三元以上的同现概率统计为基础,研究有关汉语词语自动聚类关键技术,包括汉语构词统计规律、基于上下文的词语相似度的计算方法、面向大词表的词语自动聚类算法,进而构造一个基于类的统计语言模型。本项目的实施对人工智能、自然语言处理等领域具有重要的科学意义和应用前景。
{{i.achievement_title}}
数据更新时间:2023-05-31
粗颗粒土的静止土压力系数非线性分析与计算方法
环境类邻避设施对北京市住宅价格影响研究--以大型垃圾处理设施为例
中国参与全球价值链的环境效应分析
基于公众情感倾向的主题公园评价研究——以哈尔滨市伏尔加庄园为例
基于细粒度词表示的命名实体识别研究
基于词语独异性特征的大规模词义标注语料库自动构建研究
基于语料库的汉语短语自动切分方法研究
大规模汉语历时语料库建设及词汇语义变迁研究
基于Web的大规模双语语料库挖掘及翻译知识自动获取