After more than ten years of development, metabolomics has shown its broad application prospects in clinical medicine. Meanwhile, many difficulties come out in the extraction and analysis of useful information because of the complexity of high throughput metabolomics data. With the deepening of research, we recognized that the clinical application of metabolomics depends on the solution of some basic questions in data analysis. In this project, we will deal with two key questions, acquisition of high quality MS data and classification of imbalanced data, in the metabolomics based clinical diagnose of coronary heart disease (CHD).. Ultra performance liquid chromatography-high resolution mass spectrometry (UPLC-HRMS) is the method of choice for metabolic analysis and is used to generate massive amounts of data to answer various biological questions in metabolomics. The improvement of MS technologies have gradually caused metabolomics datasets to become larger with more intricate inner structures. The data from data-independent acquisition (DIA) is the most informative but dirty. In order to acquire high quality DIA MS data, noise and interference MS should be filtered. In this project, a series of chemometric algorithms and strategies will be proposed aiming to filter interference MS, based on multivariate resolution of overlapping peaks, fragmentation pattern of metabolites and machine learning. In addition, we will try to extract the useful information of different MS data, and integrate the structure information of metabolites obtained from soft wares, websites and data libraries. . The effective pretreatment method of MS data provides a guarantee for obtaining high-quality metabolomics data. While, for the application of metabolomics in disease diagnosis, the correct classification of different clinical samples is the problem to be solved. Imbalanced data is popular in disease metabolomics. However, common classification algorithms failed to deal with the imbalanced data because of its special data structure. As we all know, the risk of misdiagnosis of CHD patients, high-risk CHD patients and healthy people is different. In this project, we will statistically evaluate the risk of misdiagnose of different groups based on the medical records and experience of clinical doctors, then, design the penalty factor, and propose the cost sensitive classification method for imbalanced data. In addition, new evaluation criteria will be proposed to evaluate the accuracy, predictive ability and misclassification cost of the classification model.. The research results will provide practical methods and strategies to solve the biological questions in life science, and will help us to expand the research fields of MS technology and chemometrics.
代谢组学高分辨质谱数据集具有高通量、高复杂度、类不平衡的特点。如何将数据由高通量转化为高质量,如何处理类不平衡数据的分类困难已成为制约代谢组学应用于临床诊断的两个关键问题。本项目以冠心病的临床诊断为切入点,旨在从临床需求出发以具体问题为依托解决代谢组学研究中的一些基础性问题。DIA数据覆盖代谢物信息最为丰富但数据污染严重。针对该问题,本研究提出将重叠峰的多元分辨、代谢物裂解规律研究与机器学习相结合进行质谱过滤的新思路,并以质谱过滤为核心获取高质量的代谢组学数据。另一方面,从代价敏感的角度解决类不平衡数据分类的问题。从大量的临床数据中挖掘有用信息,统计评估冠心病高危人群、患者和健康人的误诊风险。以此为基础设计惩罚因子和代价评估指标,提出代价敏感的类不平衡数据分类新方法和评价准则。本研究可为代谢组学应用于临床诊断中关键问题的解决提供切实可行的方法和策略,具有重要的理论价值和实际应用价值。
如何将代谢组学数据由高通量转化为高质量,如何处理类不平衡数据的分类困难已成为制约代谢组学应用于临床诊断的两个关键问题。本研究以冠心病的临床诊断为切入点,围绕这两个关键科学问题开展系统深入的研究。.本研究对血浆样本提取方法,色谱、质谱条件进行优化,建立了基于高分辨质谱技术的血浆代谢组学分析检测方法。针对传统数据依赖采集模式只能获得有限二级质谱信息的问题,优化数据非依赖采集模式参数,获得了二级质谱数量多、信息量大的代谢谱。对高分辨质谱数据噪声特点进行分析,提出了一种融合氮规则过滤、诊断碎片离子过滤、质量亏损过滤、中性丢失过滤等多种过滤方法的质谱整合过滤策略,可在过滤干扰质谱的同时有效获取目标代谢物质谱信息。同时,分析归纳总结了多类代谢物的色谱流出规律和质谱裂解规律,用于代谢物的准确识别。研究结果可提高代谢组学质谱数据质量,显著降低代谢物识别的假阳性,提高代谢物定性的覆盖面和准确度。.针对类不平衡数据分类问题,本研究从类间不平衡度、数据维度、变量相关性三个方面探究代谢组学数据结构对于分类的影响。在此基础上,基于模拟数据和真实代谢组学数据,从变量选择和数据再平衡的角度对类不平衡数据展开研究,提出了基于LASSO的稳健的变量选择算法和基于最小重叠度的特征选择与分类算法。针对代谢组学数据分类效果的评价问题,提出了精确率-召回率曲线评价准则,平衡和不平衡状态下的假阳性率评价准则。.本研究围绕2型糖尿病和高血压这两个发生冠心病的重要危险因素,开展冠心病及其并发症的代谢组学研究,建立了健康人、冠心病、冠心病合并糖尿病、冠心病合并高血压患者基于代谢组学的分类模型,研究发现氨基酸代谢和脂质代谢紊乱是冠心病及其两种合并症的重要代谢特征。研究结果可为基于代谢组学的冠心病辅助临床诊断提供重要的生物靶标。
{{i.achievement_title}}
数据更新时间:2023-05-31
基于一维TiO2纳米管阵列薄膜的β伏特效应研究
论大数据环境对情报学发展的影响
DeoR家族转录因子PsrB调控黏质沙雷氏菌合成灵菌红素
转录组与代谢联合解析红花槭叶片中青素苷变化机制
青藏高原狮泉河-拉果错-永珠-嘉黎蛇绿混杂岩带时空结构与构造演化
基于药物基因组学与代谢组学的阿司匹林抵抗临床诊断与中药干预
冠心病代谢组学特征谱和生物学调控机制研究
基于影像学及代谢组学评估冠心病进展的多模态研究
基于临床代谢组学推断代谢功能异常的生物信息学方法