LncRNAs (long non-coding RNAs) play critical roles in many biological processes such as cell differentiation, chromatin modification, transcriptional and post- transcriptional regulation, most of which require the interactions with other molecules (e.g., proteins, miRNAs). The dysregulation of lncRNAs is often associated with many human diseases. Discovering the lncRNA-protein interactions and lncRNA-disease associations can help to elucidate the function of lncRNAs, their regulatory mechanism, their pathological mechanisms in complex diseases, and provide guidance for more effective therapeutic intervention to complex diseases. Numerous computational methods have been developed for predicting new lncRNAs, lncRNA-protein interactions and lncRNA-disease associations, however, most existing methods are limited by using the hand-crafted features depended on the biological knowledge and experiences. Based on the powerful performance of automatically learning features of deep learning, the goal of this project is to develop the effective models and algorithms for to enable 1) accurate recognition of lncRNAs, 2) accurate prediction of lncRNA-protein interactions, 3) accurate identification of protein-binding nucleotide on the lncRNA sequences, 4) accurate prediction of lncRNA-disease association. According to the advantages of convolutional neural network (CNN), long short term memory network (LSTM) and stacked autoencoder network (SAE) due to different deep learning architectures (e.g., CNN is more appropriate for sequences data, LSTM has memory characteristic and SAE can effectively extract the important features from high-dimension input feature vector), and the biological characteristics of lncRNAs, lncRNA-protein interactions and lncRNA-disease associations, we will investigate the coding schemes of RNA sequences and their secondary structural unit sequences, protein sequences and their secondary structural unit and structural domain unit sequences, the feature extraction approaches of lncRNAs and diseases, and develop series of multi-modal hybrid deep learning models with CNN, LSTM and SAE to accurately distinguish lncRNAs and mRNAs, effectively predict lncRNA-protein interactions, precisely identify protein-associated nucleotide on the lncRNA sequences, exactly predict lncRNA-disease association. Particularly efforts are also planned to develop the software and user-friendly tools to facilitate the functional lncRNAs research for biologists and computational scientists.
LncRNA在众多细胞生理活动中扮演重要角色,其异常表达与人类重大疾病密切相关。lncRNA-蛋白质互作用、lncRNA-疾病相关性研究有助于揭示lncRNA调控功能机制,有效指导复杂疾病干预治疗。本项目将对功能性lncRNA相关预测问题进行深入研究,基于深度学习的强大自动特征提取及表达能力,发展有效的lncRNA相关预测算法。根据不同深度学习网络的结构特点,如深度卷积神经网络能够有效处理长度不等的序列数据、长短期记忆网络模型的记忆特性、栈氏自动编码器有效捕获输入特征向量中的重要特征,及lncRNA相关预测问题的生物学特性,研究RNA序列及其二级结构单元序列、蛋白质序列及其二级结构和结构域单元序列的编码方式,提出系列多模态深度学习混合模型算法,高精度识别lncRNA、预测lncRNA-蛋白质作用关系、确定lncRNA链上靶蛋白结合位点、预测lncRNA-疾病相关性,开发预测软件工具。
lncRNA-蛋白质互作用、lncRNA-疾病相关性研究有助于揭示lncRNA调控功能机制,有效指导复杂疾病干预治疗。本项目严格按照项目计划书要求开展研究工作,提出了lncRNA识别算法、lncRNA-蛋白质相互作用预测算法、lncRNA-蛋白质结合位点预测算法、转录因子结合位点预测算法、lncRNA-疾病关联关系预测算法,并在m6A甲基化功能基因识别、癌症驱动基因识别算法、药物-药物互作用预测、细胞通信网络重构等方面开展了研究工作,取得了以下主要研究成果:1、针对现有 lncRNA 识别算法多采用手工特征,而手工特征适应性和通用性较差问题,提出了基于多模态深度学习的lncRNA预测算法。2、针对深度学习模型结构复杂、参数数量庞大,及CNN模型要求输入固定长度序列问题,分别提出了基于宽度学习和基于拷贝策略CNN的lncRNA-蛋白质互作用关系预测算法。3、针对目前LncRNA-蛋白质结合位点预测算法遗漏RNA序列碱基间关联关系、预测精度有待改善问题,提出了基于卷积神经网络的多碱基编码lncRNA-蛋白质结合位点预测算法。4、针对目前LncRNA -疾病关联关系预测算法一般利用少量已知关联信息的LncRNA和疾病信息源,不能预测新lncRNA的潜在关联疾病,且遗漏分子网络拓扑结构中的深层嵌入特征问题,分别提出了基于多源信息融合和基于网络嵌入的LncRNA -疾病关联关系预测算法。5、分别提出了COSE、ACNN、TFBS_MLCNN转录因子结合位点预测算法,实现不同场景下的转录因子结合位点高精度预测。6、分别提出了FunDMDeep-m6A、m6Acancer-Net、m6A-express、m6Aexpress-Reader、m6Aexpress-BHM 和Hot-m6A-Dis算法,高精度识别m6A甲基化功能基因及其与疾病的关联关系,发现m6A甲基化表达基因的调控模式。7、分别提出了DGMP、PDGPCS、IMCDriver、PNC驱动基因识别算法,有效识别个体化或稀有突变癌症驱动基因。8、分别提出了DPDDI、GNN-DDI、deepMDDI、CPGD药物互作用预测方法,实现药物互作用关系和组合药物的高精度预测,并解释药物互作用机制。9、提出生物系统弹性函数重构算法,有效识别癌症状态转换临界点。10、提出IRRG细胞通信网络构建方法,挖掘细胞通信模式。
{{i.achievement_title}}
数据更新时间:2023-05-31
低轨卫星通信信道分配策略
内点最大化与冗余点控制的小型无人机遥感图像配准
基于公众情感倾向的主题公园评价研究——以哈尔滨市伏尔加庄园为例
氯盐环境下钢筋混凝土梁的黏结试验研究
居住环境多维剥夺的地理识别及类型划分——以郑州主城区为例
潜在功能性长非编码RNA预测算法研究及应用
基于多源生物数据的长非编码RNA预测方法研究
长链非编码RNA的亚细胞定位预测
长非编码RNA功能预测网络模型与算法研究