As the pace of globalization has been accelerated ,the intercommunications between our country and Central Asian countries are more frequent, and there are many more scripts related with various types of communication activities need to be identified and processed.Because the identifying technology for the multilingual text images is immature,it is still operated in a traditional manmade working type when a text image is screened and classified among the large amount of text images, and its workload is huge and the efficiency of the work is low.Therefore, studying and developing Central Asian multilingual script deification system become a pressing need for solution of the subject currently, and it has important social values and practical significances. The project will take reference of the successful technology of script identification, will combine it with the special structural and figurate features of middle-Asian character, and will research the key technology of Central Asian multilingual script deification. Firstly, the database of middle-Asian multilingual text images will be established and these images are preprocessed. Then, multi-features of the text images will be extracted, and the feature database of them will be established. At last, the identifying image is classified its script type by using identifying strategy of combined multi features with the multi decision technique, which is suitable for the nature of Central Asian multilingual scripts. As an example of our countries central Asian strategy, the project will be widely used important fields in Xinjiang such as,government agency, foreign affaires, military, finance, and enterprises except it have promotion values in related areas of central Asian countries.
随着全球化步伐的逐渐加快,我国与中亚国家之间经济交流日益频繁,各类交流所涉及的多语种文字的文本图像需要进行识别和处理。由于多语种文本图像的文种识别技术还没成熟,从大量的文本图像中检索和筛选出符合需要的报文时,仍然使用传统的手工操作方式,其工作量大,效率低。因此,研究并开发中亚多语种文本图像文种识别系统为当前一个急待解决的课题,具有重要的社会价值和实用意义。本项目在借鉴已有的文种识别技术的基础上,结合中亚地区文字的特殊的结构,研究中亚多语种文本图像文种识别的关键技术。首先,建立中亚多文种文本图像样本库,并图像进行预处理。然后,提取文本图像的多种特征形成特征库。最后,对待识别的图像使用多特征、多判决技术融合的适合中亚文种特点的分类策略识别出其文种类型。本项目作为我国"面向中亚"战略的典范之一,除了在新疆的政府机关、外交、军事、金融、企事业等重要领域使用之外,对中亚国家的相关领域拥有推广价值。
本项目首先建立中亚多文种文本图像样本库,并库中的图像进行预处理。然后,提取文本图像的多种特征,如, 频域的纹理特征,HSV特征,非下采样Contourlet变换(NSCT)特征, Tamura特征,曲波变换的特征和SURF特征等, 形成对应的特征库。最后,通过待测文本图像特征和数据库样本特征分别利用多种分类器(如,多种距离分类器,K-NN, Bayes, BP 神经网络和SVM 分类器)进行特征匹配对多文种文本图像进行分类识别和检索,并获得较理想的识别效率。提取的各类特征互相融合,以及加权融合,进一步提高了中亚多文种文本图像的文种识别率。研究成果有:学术论文9篇,学术专著1部,发明专利1项,计算机软件著作权登记3项。
{{i.achievement_title}}
数据更新时间:2023-05-31
居住环境多维剥夺的地理识别及类型划分——以郑州主城区为例
基于细粒度词表示的命名实体识别研究
基于协同表示的图嵌入鉴别分析在人脸识别中的应用
适用于带中段并联电抗器的电缆线路的参数识别纵联保护新原理
基于Pickering 乳液的分子印迹技术
维、哈、柯多文种信息检索技术研究
印刷体汉字识别
基于CNN和三阶段分类策略的多文种手写签名识别技术研究
面向100Gbps报文处理的硬件加速模型及机理研究