As we all know, there are a large number of Chinese English learners in China. Teachers need to correct large amounts of English essays and educational institutions had to assess large-scale English testing essays every year. The substantial quantities and great difficulties in English writing assessment is now the bottleneck problem in English teaching and testing. In order to solve the problem, effective automated essay scoring and diagnostic feedback algorithms are in great need of in China. The existing composition feature extraction algorithms and automated essay scoring technology are primarily for native English-speaking writers, and few results of research are for non-native English-speaking writers. The purpose of the project is to study and find the effective automated essay scoring algorithms and diagnostic feedback methods for Chinese English learners. In the project, five aspect of research are to be study:.1) First, we will study and present the effective feature extraction methods which can not only reveal the characteristic of Chinese students writing, but also improve the result of scoring for Chinese student’s essays basing on second language acquisition theory..2) Generally, the number of middle score essays are much more than that of high score and low score essays, so the data of essays are imbalance. In the project, a good strategy is to be study to improve the classification result for the imbalance data of essays..3) Basing on the result of our research on clustering algorithm for large-scale text, we are going to study the application of clustering algorithm on the field of automated essay scoring and propose a new scoring algorithm basing on two-stage incremental clustering strategy. .4) On the aspect of off-topic essay detection, the subject word extraction mechanism of multiple documents and language semantic calculation are combined with as the detection method..5) On the aspect of diagnostic feedback, we are mainly focus on grammatical error detection and suggested correction algorithm that is independent of large scale error-marked essay corpus basing on our preliminary test result..The research has important theoretical value and broad application prospects, and could help improve performance in scenario such as automated Chinese essay scoring, minority language automatic essay scoring and English subjective question assessment..
中国英语学习者人数众多,迫切需要针对中国学生特点的、有效适用于大规模英文作文数据的全自动评分算法,以解决中国现有英语教学和大规模英语考试中英文作文批改量大和难度大的瓶颈问题。现有作文特征提取和自动评分技术主要面向以英语为母语的学生作文,针对中国英语学习者的全自动作文评分及诊断反馈技术的研究成果还不多见。本项目主要进行以下几方面的研究:(1)研究能够呈现中国学生英文写作特点的特征提取算法;(2)针对作文分数具有不平衡分布的特点,研究基于不平衡数据有效分类的作文自动评分算法;(3)研究基于增量聚类的作文自动评分算法;(4)研究基于多文档主题词提取的作文离题识别算法;(5)研究不依赖于大规模作文错误语料库的语法检错及正确推荐算法。研究内容同时将推动中文作文自动评分、小语种作文自动评分以及英语主观题自动评分等相关场景中的应用。
中国英语学习者人数众多,迫切需要针对中国学生特点的作文自动评分和作文离题检测算法,以解决中国现有英语教学和大规模英语考试中英文作文批改量大和难度大的瓶颈问题。现有作文特征提取和自动评分技术主要面向以英语为母语的学生作文,针对中国英语学习者的作文特征分析及自动评分等相关研究成果还不多见。本项目从文本特征选择和文本内容分析两个方面开展研究,并专注于在作文自动评分、作文离题检测等应用研究。研究内容包括:1)在文本特征选择方面,提出了基于Coh-Metrix特征选择的面向中国英语学习者的英文作文自动评分算法;依据特征可分性的特点,提出了使用特征可分和基于聚类的特征选择算法;2)在作文离题识别方面,基于作文子话题所导致的噪音问题和作文阈值的自动选择问题,提出了基于局部密度选择的无监督作文离题识别算法;依据目标主题和参考主题与待测作文之间在主题语义上的差异性,提出了基于目标主题和参考主题的无监督作文离题检测方法。3)在文本内容分析方面,依据中文句子在句块上的特殊性,融合潜伏语义分析和组合词向量模型,提出了基于融合语义特征的中文问题分类算法;基于未登录词和新词识别所导致的热点信息正确识别和描述问题,提出了基于复合词生成的网络热点话题识别及描述方法;基于现有分类算法识别虚假信息所导致的不能及早发现微博上流行的虚假信息的问题,提出了基于把关人行为的微博虚假信息及早检测方法。本项目的研究成果包括16篇高水平论文,并标注了广东省高考英文作文写作语料库。
{{i.achievement_title}}
数据更新时间:2023-05-31
EBPR工艺运行效果的主要影响因素及研究现状
基于铁路客流分配的旅客列车开行方案调整方法
基于改进LinkNet的寒旱区遥感图像河流识别方法
东太平洋红藻诊断色素浓度的卫星遥感研究
早孕期颈项透明层增厚胎儿染色体异常的临床研究
汉语考试中海量作文多层面全自动评分技术
基于多任务学习的自动修辞分析与作文评分关键技术研究
基于深度神经网络的自动作文评分算法研究
面向大类别的空中手写中英文识别技术研究