因特网中文金融新闻中抽取事件及其相关时间信息的研究

基本信息
批准号:69975008
项目类别:面上项目
资助金额:12.00
负责人:苑春法
学科分类:
依托单位:清华大学
批准年份:1999
结题年份:2002
起止时间:2000-01-01 - 2002-12-31
项目状态: 已结题
项目参与者:Kam-Faiwong,黎利,陈刚,朱晓丹,赵强
关键词:
信息抽取事件抽取时间关系
结项摘要

The extraction of events and their relevant temporal information from Internet Chinese financial news can be considered as a task of structurization of the data that appear in a text. To implement it, two key problems must be solved. One is the financial field Chinese analysis, and the other is the Chinese temporal system analysis. In the study, we have paid more attention to the special characteristic of Chinese and the financial field, and made good use of them. The statistical language learning method is always used in our research, and the method based on linguistic theory is used as well. As a foundation, we have constructed a 1,100,000 tokens financial corpus firstly. After the statistical analysis of this corpus seven knowledge bases have been built for the company name identification system. Based on the seven knowledge bases and some artificial rules, the company name is identified through using a twice-scanning method in our system. The experiment result shows the F1-Measure(β =1) rates of 94.00% and 82.50% respectively for close test and open test. In the statistical language learning, the relative deficiency of training data (sparse data) is always a crucial problem. As known, maximum entropy model has adopted much better tactics for the estimation in the cases of lacking prior knowledge. Through studying and carefully comparing with experiment, the maximum entropy model has been selected to solve main language analysis finally. And at the same time, the structure risk theory is also introduced to solve the characteristic-choosing problem of maximum entropy model. This work is novel and original and has important significance. A maximum entropy model has been trained to recognize maximum NPs in sentences. This model has achieved F1-Measure rates of 93.79% in close test and 91.84% in open test. For part-of-speech (POS) tagging, the maximum entropy model has achieved accuracy of 97.77% in close test and 96.29% in open test. Based on above work, a system of extracting financial events has been integrated. The preliminary experimental result shows the F1-Measure rates of 88.99% in close test and 74.06% in open test respectively. In the temporal information analysis of Chinese, we have tagged the financial news texts of 3.25 Ms, which amount to 2000 files. Based on statistical analysis and summarization, fifteen temporal information expression patterns have been made out. Because there are no formal changes in Chinese verbs, we proposed a special method to analyze Chinese temporal information system. The key point of this method is founded on the situation type of a main verb in one sentence, and then we use the temporal noun phrase, time auxiliary word and time adverbial word of Chinese together to recognize the situation of the sentence. The experimental result shows this method is correct and feasible. The precision and the recall of the test system are both 91.1%. It is a creative work and this research will serve as a consultation for the financial activity and offer a foundation for the prediction of financial events. It is of great significance both for theory and application.

本项目以因特网上的中文金融新闻为对象,研究如何在部分语言分析的基础上从中抽取时间及其相关的时间信息。关于事件的时间信息,主要指通过句子时相结构特征所表达的时间信息,如它是一个瞬时事件的发生,还是一个持续活动的开始或结束等。通过孤立事件的绝对时间关系,可从中进一步推出多个事件之间的相对时间关系。这种信息抽取的结果,将有助于用户掌握某一经济实体的历史,了解各金融事件之间的前因后果,并可以作为经济决策的重要依据。

项目摘要

项目成果
{{index+1}}

{{i.achievement_title}}

{{i.achievement_title}}

DOI:{{i.doi}}
发表时间:{{i.publish_year}}

暂无此项成果

数据更新时间:2023-05-31

其他相关文献

1

基于LS-SVM香梨可溶性糖的近红外光谱快速检测

基于LS-SVM香梨可溶性糖的近红外光谱快速检测

DOI:
发表时间:
2

信息熵-保真度联合度量函数的单幅图像去雾方法

信息熵-保真度联合度量函数的单幅图像去雾方法

DOI:10.3724/SP.J.1089.2019.17435
发表时间:2019
3

新产品脱销等待时间对顾客抱怨行为的影响:基于有调节的双中介模型

新产品脱销等待时间对顾客抱怨行为的影响:基于有调节的双中介模型

DOI:
发表时间:2023
4

高分五号卫星多角度偏振相机最优化估计反演:角度依赖与后验误差分析

高分五号卫星多角度偏振相机最优化估计反演:角度依赖与后验误差分析

DOI:10.7498/aps.68.20181682
发表时间:2019
5

WMTL-代数中的蕴涵滤子及其应用

WMTL-代数中的蕴涵滤子及其应用

DOI:10.11897/SP.J.1016.2018.00886
发表时间:2018

苑春法的其他基金

批准号:69375017
批准年份:1993
资助金额:6.00
项目类别:面上项目
批准号:60573186
批准年份:2005
资助金额:22.00
项目类别:面上项目

相似国自然基金

1

基于篇章特征的越南语新闻事件信息抽取关键技术研究

批准号:61562049
批准年份:2015
负责人:周枫
学科分类:F0211
资助金额:38.00
项目类别:地区科学基金项目
2

基于叙事模式分析的无监督新闻事件语义抽取研究

批准号:61202233
批准年份:2012
负责人:冯岩松
学科分类:F0211
资助金额:25.00
项目类别:青年科学基金项目
3

面向社交网络的中文事件抽取与预测研究

批准号:61806137
批准年份:2018
负责人:王中卿
学科分类:F0606
资助金额:26.00
项目类别:青年科学基金项目
4

面向社会舆情的中文事件抽取及其可信度计算的研究

批准号:61472265
批准年份:2014
负责人:李培峰
学科分类:F0211
资助金额:82.00
项目类别:面上项目