Complex diseases are often related to collaborative effects involving interactions of multiple genes and the environment, identifying genes with definite biological functions based on omics data is helpful toward diseases diagnosis and new drug targets identification. The association between single gene and phenotype contains simple differential expression and complicated non-linear pattern, therefore, the explicit function of the association is unknown and their types uncountable. The synergy of pairwise genes associated with phenotype widely exists, which has diverse patterns. However, existing gene co-expression network methods were constructed by differential expression genes. The intrinsic defect of these methods is information missing. Maximal information coefficient (MIC) has the merit of easily characterize the complex association between two variables. Based on MIC, in this program, we plan to: 1) Improve the MIC(Y,X) to get higher statistical power β-Chi-MIC(X;Y), by reducing the number of segment from two direction with chi-test, then select individual effect genes generally with β-Chi-MIC(X;Y). 2) Extend the MIC(Y,X) to MIC(X1;X2;Y) and select synergic genes generally. 3) Construct gene co-expression network based on general associated genes, and then analyze the functions of gene modules and Hub genes with the help of GO and KEGG database. With above superior methods, we object to identify the pathogenic genes with definite biological functions.
复杂疾病受多基因与环境共同影响,基于基因表达谱等组学数据挖掘生物学意义明确的致病基因在临床诊断与药物靶标筛选等方面意义重大。表型Y与单个基因X的关联既有简单的差异表达,又有复杂的非线性模式,其显性表达式未知而不可穷尽;两个基因对表型的配对增效也广泛存在,且模式多样。现有基因共表达网络仅基于差异表达基因构建,先天不足。以两变量关联普适性测度——最大信息系数MIC(Y,X)为基础,本项目拟:1)采用双向控制分段策略改进MIC(Y,X)获得高统计势的β-Chi-MIC(X;Y),普适性选择单效应基因;2)将MIC(Y,X)拓展到三变量关联,获得标准化的配对互作测度MIC(X1;X2;Y),普适性选择配对增效基因;3)基于普适性关联基因构建复杂疾病共表达网络,结合GO、KEGG等解析基因模块与枢纽基因功能。预期获得生物学意义明确的致病基因。
普适性鉴定与疾病表型关联的信息基因,并以此探究基因之间的相互关系,是精准医疗的关键。基于项目申请人前期开发的改进最大信息系数估计算法Chi-MIC,提出特异性的非线性表达信息基因检测算法—标准化差异关联系数(normalized differential correlation, NDC),具有能有效挖掘传统方法忽略的非线性表达基因的能力,可作为传统生物标记挖掘方法的有效补充。针对基因-基因互作检测效率低的问题,基于ABS转换,提出了基因配对互作快速检测算法,并且转换后的基因得分能显著提高诊断模型预测精度。.基于Chi-MIC算法,在既考虑特征与目标变量之间的关联又考虑特征与特征之间的冗余情况下,开发了基于Chi-MIC-share的最优特征子集选择算法。经多个分类、回归基准数据验证,其能在显著提高选择效率的情况下提高特征子集预测性能。.基于Chi-MIC算法最优超簇划分原理,提出了卡方决策表算法Chi-DT,用于剪切位点预测模型的特征选择。有效解决了剪切位点预测中正负样本极度不均衡的问题,显著提高了供体位点预测精度。.另外,2019年底COVID-19在武汉市爆发,中国政府果断采取了封闭武汉等一系列大尺度防疫措施,但国际上认为此举过激。为此,我们基于百度迁徙大数据构建模型,评估了武汉封城对中国大陆疫情传播的影响,定量证明了武汉封城的必要性,充分发挥了大数据分析在生产实践中作用。
{{i.achievement_title}}
数据更新时间:2023-05-31
DeoR家族转录因子PsrB调控黏质沙雷氏菌合成灵菌红素
监管的非对称性、盈余管理模式选择与证监会执法效率?
跨社交网络用户对齐技术综述
粗颗粒土的静止土压力系数非线性分析与计算方法
城市轨道交通车站火灾情况下客流疏散能力评价
动力学关联系统的普适行为研究
整合基于GO的基因关联网络与基因共表达网络阐明植物复杂性状的分子遗传学基础
关于费米子代普适性的研究
急性脑梗死血瘀证与其基因共表达网络动态关联模型研究