The development of genomics and proteomics has provided great chanllenges to the research of bioinformatics. As the development of high throughput biological technology, massive biological data are produced in an explosive way, and the current biological data has many special characteristics, such as, multiple, associative, high-noisy, inaccurate,and incomplete. How to seek an efficient way to solve biological computation problems based on biological data analysis has become the major concerned problem in national strategic needs and major project needs. For very long time, various methods from different fields have been studied to get novel computer algorithms to deal with massive,complicated biological data efficiently, which include heuristic algorithm, approximation method, exact algorithm, artificial intelligence method, parameterized computation method, etc. Much attention has been paid on above several methods either to improve the current methods or to find more efficient methods. The general process solving biological computation problems consists of two steps: get a feasible model for a problem, and then find appropriate methods to solve the problem by model characteristics. It can be seen from current biological problem modeling and solving processes that the characteristics analysis of biological data plays an important role in modeling and solving processes, which sometimes has direct impact on the complexity of the model and the efficiency of the designed algorithm. Therefore, how to efficiently analyze biological data to model and solve biological computation problems becomes a hot research topic in bioinformatics..This project will seek for new computer methods to solve problems in bioinformatics. Firstly, the multiple properties of biological data will be analyzed, and critical parameters which decide the computation complexity of problems will be thoroughly studied. Based on the analysis of biological data and critical parameters, multiple parameter models for biological computation problems will be established. Based on the multiple parameters model built for biological computation problems, this project will make full use of parameterized computation, heuristic, data compression, multiple data merging methods together to solve hot research topics and problems in bioinformatics, for the aim of presenting a systematic methods sloving biological computation problems based on data characteristics. Finally, the project will design related software platform with self-owned intellectual property rights based on the algorithms studied, and apply the software to the diagnose, analysis and therapy of complicated diseases. The research results of this project will be closely related to national strategic needs, and will provide novel idea for analyzing data characteristics and finding efficient computation methods to many fields, especially to many major engineering projects.
基因组学、蛋白组学等领域的发展对现代生命科学研究带来了前所未有的机遇和挑战。如何寻求生物信息学新计算技术解决生命科学领域的国际前沿问题已成为国家重大战略需求和重大工程需求的关键科学问题。本项目将跳出传统计算机算法设计思路,首先分析生物数据的多元特性,挖掘影响问题复杂性的关键参数,刻画生物计算问题的多元模型。然后,基于生物计算问题多元模型,结合参数计算方法、启发式方法、数据压缩方法和多元信息融合方法,解决生物计算中的相关热点和难点问题,建立一套面向生物数据特征的生物计算难解问题系统求解方法。最后,本项目将基于生物计算问题的算法研究成果,建立自主知识产权的相关软件处理平台,并实际应用于复杂疾病的诊断、分析和治疗中。本项目的研究将为复杂生物数据处理提供高效的计算方法,为面向重大工程需求进行数据特征挖掘和高效计算方法的研究提供新思路,推动我国面向实际工程和国家重大需求的高效计算方法的研究和应用。
在本基金的资助下,课题组针对基因组序列分析与组装、蛋白质结构及功能预测、生物网络构建与分析、疾病-miRNA和lncRNA关系预测、生物显微图像重构等复杂生物数据处理中的若干关键问题展开研究,主要成果如下:1. 深入分析了序列数据、蛋白质相互作用数据等各种不同类型生物数据的噪声分布情况,结合生物数据之间的强关联关系研究了生物数据去噪处理的方法;深入挖掘了复杂生物数据自身所固有的特征,为研究面向数据特征的高效计算方法提供了依据;2. 针对新一代测序技术和宏基因组数据,重点研究了短片段拼接、结构变异发现和高阶SNP发现等生物计算问题,利用双端读数分布、insert size分布等特征,设计了基于De Bruijn图的序列组装方法、基于路径扩展的scaffolding方法和基于读数分割策略的gap填充方法以及结构变异发现和高阶SNP发现方法;3. 通过融合转录组、肽标识蛋白质信息以及互作组信息设计了蛋白质鉴定方法;提出了单分子定位和贝叶斯技术相结合的新型活细胞超分辨率显微技术用于蛋白质的精确定位;4.针对静态蛋白质网络分析的若干局限性,通过融合时间序列下的基因表达等数据,提出了新的动态蛋白质网络构建方法,设计了基于蛋白质活性的复合物提炼方法和基于蛋白质网络拓扑特性及多元生物信息的一系列蛋白质复合物挖掘方法;5. 通过融合多相似性网络并基于核贝叶斯矩阵分解、逻辑矩阵分解和随机游走等技术,提出了一系列疾病-miRNA、疾病-lncRNA关系预测方法和药物重定位方法;6. 针对生物显微图像重构的高性能计算问题,提出了一系列生物大分子冷冻电镜图像处理方法、生物大分子冷冻电镜图像重构算法和大规模、大尺度的生物大分子冷冻电镜数据并行处理方法。7. 基于生物计算中若干问题的数据特征,通过挖掘影响问题复杂性的若干关键参数,建立了相关问题的多元参数模型,并给出了多元参数模型的复杂性分析和参数算法设计;8. 在提出的上述方法的基础上设计开发了一系列开源的生物计算软件和web在线服务工具。
{{i.achievement_title}}
数据更新时间:2023-05-31
一种光、电驱动的生物炭/硬脂酸复合相变材料的制备及其性能
正交异性钢桥面板纵肋-面板疲劳开裂的CFRP加固研究
环境类邻避设施对北京市住宅价格影响研究--以大型垃圾处理设施为例
宁南山区植被恢复模式对土壤主要酶活性、微生物多样性及土壤养分的影响
基于多模态信息特征融合的犯罪预测算法研究
面向分布式迭代数据处理的高效容错机制
基于辛算法的复杂目标RCS高效计算方法的研究
复杂物体形态图计算理论和高效计算方法研究
大型复杂结构模型修正与损伤识别的高效计算方法研究