This project aims to develop new statistical methods for analyzing high-throughput next generation sequencing (NGS) data. The NGS techniques have great advantages over the traditional biological techniques, including much higher sequencing speed and throughput, and much lower cost. Therefore, NGS techniques have been widely used in biomedical studies and other life sciences areas. To solve interested life sciences problems, it is extremely important to fully exploit the information contained in such big biological data. Most NGS studies have a very high dimension of data but a relatively very small sample size, and analyzing such big biological data is a great challenge to statisticians. This project will develop several novel statistical methods for detecting diseases or other types of phenotype associated biomarkers using NGS data: based on pooled DNA sequencing data, we will develop a novel statistical method to simultaneously identify multiple rare variants that are associated with diseases: based on DNA methylation data, we will develop a parametric method and a nonparametric method for detecting disease associated DNA methylation loci/regions; based on RNA sequencing data for multiple conditions, we will develop new statistical methods using dimension reduction techniques to detect those genes or isoforms differentially expressed between multiple conditions. These works would be helpful for better understanding the molecular mechanism of complex disease, predicting risk factors, developing personalized therapeutic regimen, and so on.
本项目基于新一代高通量测序数据发展若干统计学新方法。新一代测序(NGS)技术相比传统生物技术在测序速度、通量、单位测序成本上都具备很大的优势,已成为生物医学及其他生命科学领域的主流测序技术。一个极其重要的工作是如何充分发掘这类生物大数据所蕴含的信息以解决感兴趣的生命科学问题,而大多数NGS数据具有数据维度高但样本量相对很小的特点,这给统计学家带来相当大的挑战。本项目将基于几类常见的新一代测序数据发展统计学新方法用于检测复杂疾病(如癌症)或其他表型相关的生物标记:基于DNA混合测序数据发展新的多碱基罕见变异-疾病关联分析方法;基于DNA甲基化测序数据发展新的参数和非参数统计方法用于检测疾病关联DNA甲基化位点/区域;基于多条件RNA测序数据发展新的降维技术用于检测差异化表达基因/同源体。这些工作将为进一步在分子水平揭示复杂疾病的发病机制、预测发病风险、开发个体化治疗方案等奠定重要的基础。
本项目基于几类常见的新一代测序数据发展统计学新方法用于检测复杂疾病(如癌症)或其他表型相关的生物标记,这些工作将为进一步在分子水平揭示复杂疾病的发病机制、预测发病风险、开发个体化治疗方案等奠定重要的基础。在Annals of Applied Statistics(应用统计学顶级期刊)、Bioinformatics(生物信息学顶级期刊)、Statistics in Medicine(生物统计学顶级期刊)等期刊上发表共5篇受该项目资助的论文。这些论文解决了高通量组学数据分析中的一系列重要统计学问题,包括癌症特异基因的检测(RNA测序数据)、疾病特异基因检测(病例对照基因型数据)、亲缘效应基因检测(病例-对照母子对基因型数据)、复杂疾病诊断(多生物标记),并开发了相应的R软件包。
{{i.achievement_title}}
数据更新时间:2023-05-31
玉米叶向值的全基因组关联分析
正交异性钢桥面板纵肋-面板疲劳开裂的CFRP加固研究
硬件木马:关键问题研究进展及新动向
基于SSVEP 直接脑控机器人方向和速度研究
小跨高比钢板- 混凝土组合连梁抗剪承载力计算方法研究
基于新一代测序数据的非比对统计功效的研究
基于高通量测序数据研究基因组变异的统计问题
基于新一代测序数据的标准化,FDR控制及分类问题的统计方法研究
基于高通量测序数据的isomiR功能研究