Mining implied causal relationship from the non-linear non-Gaussian continuous data, is an emerging research hotspot in data mining, and an important issue is how to deal with the high computational complexity problem at present. This project plans to carry out research based on local learning theory for the purpose of reducing the learning complexity. First, for linear non-Gaussian data, we will further explore the distribution law of the partial correlation coefficient, investigate the correlation measurement based on hypothesis testing, and construct fast and effective causal structure learning algorithms via local learning strategy. Second, in terms of non-linear non-Gaussian data, we plan to study the simultaneous equations model and polynomial approximation theory and apply it to describe the data object, and then establish association between the equation coefficients and correlation, and integrate the local learning strategy to build fast and efficient causal structure learning algorithms. In order to deal with high-dimensional non-linear non-Gaussian data, we will investigate the online causal structure learning framework based on streaming feature, explore the non-linear non-Gaussian conditional independence test criterion, put forward Markov blanket online updating method and online causal structure adjustment method in combination with local learning idea, and finally construct fast and effective online structure learning algorithms based on streaming feature. The research could advance the theoretical and methodological development of causal discovery from the non-linear non-Gaussian data.
从非线性非高斯的连续数据,挖掘数据蕴含的因果关系,是目前数据挖掘领域新兴的研究热点,计算复杂度较大是目前学习面临的重要问题。本项目拟基于局部学习理论进行研究,以期降低学习的复杂度。首先,针对线性的非高斯数据,进一步探索偏相关系数的分布规律,构建基于假设检验的相关性度量,结合局部学习策略,构建快速有效的因果结构学习算法。然后,针对非线性非高斯数据,探索基于联立方程模型和多项式拟合理论对数据对象进行描述,进而建立方程系数与相关性之间的关联,最后融合局部学习策略,构造快速有效的因果结构学习算法。为了处理高维的非线性非高斯大数据,探索基于流特征的在线因果结构学习框架,进而探索非线性非高斯条件独立型测试的标准,构造在线的马尔可夫毯的更新方法,融合局部学习思想,构建在线的因果结构调整方法,最终实现快速有效的基于流特征的在线结构学习算法模型。研究成果可以为非线性非高斯数据因果发现奠定理论和方法基础。
本项目面对线性非高斯的连续数据,挖掘数据蕴含的因果关系,取得了如下四方面的研究成果: .(1)基于联立方程模型的贝叶斯网络结构学习算法:结合联立方程方法和局部学习策略提出了基于联立方程的贝叶斯网络结构学习算法,BSEM(Based on simultaneous equations model)算法,该算法可以处理线性非高斯的连续数据;.(2)改进的基于偏相关的贝叶斯网络结构学习算法:进一步探索偏相关系数的数学分布规律,进而融合局部学习理论构建了改进的基于偏相关的IPCB(Improved Partial Correlation Based)的结构学习算法模型。该算法可以线性非高斯的连续数据。.(3)基于流特征的因果结构学习方法:融合在线的马尔可夫毯的更新方法提出了基于流特征的因果结构学习方法,能够从具有流特征的线性任意分布的数据中发现蕴含的因果结构关系,满足在线学习的时效性要求。.(4)建立因果发现试验平台:该平台包括数据产生方法和L1MB算法,SC算法,以及PCB算法,BSEM算法,IPCB算法,TC算法,Two-Phase算法等。.在非线性数据方面没有取得实质性突破,将继续在这方面开展研究工作。
{{i.achievement_title}}
数据更新时间:2023-05-31
演化经济地理学视角下的产业结构演替与分叉研究评述
粗颗粒土的静止土压力系数非线性分析与计算方法
低轨卫星通信信道分配策略
内点最大化与冗余点控制的小型无人机遥感图像配准
基于公众情感倾向的主题公园评价研究——以哈尔滨市伏尔加庄园为例
面向复杂数据基于流形学习的非线性降维算法研究
因果结构学习与因果推断
面向大规模数据的机器学习算法研究
基于深度学习的非结构化大数据分析算法研究