Transcription factors (TFs) can modulate gene expression patterns and hence are key components of cellular regulatory networks. TFs bind to DNA in a sequence-specific manner. The relative preferences of TFs to various nucleotide sequences are often referred to as TF binding site (TFBS) motifs. These motifs are of considerable interest to biological study, as they are central to understanding the mechanisms of gene expression. In this project, we systematically study the computational methods for the motifs discovery of transcription factor binding sites. Firstly, we propose a discriminative motif finder for discovering high quality initial motifs between two sequence datasets, which uses area under receiver-operating characteristic curve (AUC) as a measure of the discriminating power of motifs and incorporate novel search strategies. Secondly, we propose a new framework for estimating generative probabilistic motif models via a contrasting process, which can provably learn the optimal motif parameter by discriminating the observed binding data from samples from an adaptive noise distribution. Finally, we reformulate the discriminative motif finding problem into a multiple-instance learning framework, thereby more properly modeling the underlying inference problem and facilitating the incorporation of advanced machine learning and optimization tools. The development of this project will promote the understanding of the underlying mechanisms of regulation. It will also help to understand the cells from the system level and explain the pathogenesis of the disease.
转录因子能够调控基因表达的模式,因而是细胞调控网络的关键组成部分之一。转录因子和DNA序列间的结合关系是具有序列偏好性的,转录因子对于不同核苷酸序列的相对结合偏好通常被称为转录因子结合位点基元,由于它们在基因表达机制理解中的核心地位,对于生物学的研究具有极其重要的意义。在本项目中,我们将系统地研究转录因子结合位点基元的计算挖掘方法。首先,提出一种新的判别方法用于在两组序列间寻找高质量的基元初始解,这种方法采用受试者工作特征曲线下面积来判别解的判别能力,并引入了新颖的搜索策略。然后,我们将基元生成模型参数学习问题转化为对比训练过程,从而可以通过将观测数据和人工数据对比,最优地学习模型参数。最后,我们将判别基元模型训练转化为多示例学习问题,从而可以更适当地建模其潜在的推断问题,并方便引入新的机器学习和优化计算工具。本项目的成功实施将有助于认识调控的内在机制,并帮助进一步从系统层次来理解细胞活动
转录因子可以与基因上的调控序列发生绑定,从而激活或抑制目标基因的表达。由于它们在基因表达机制中的核心地位,对于生物学的研究具有极其重要的意义。在本项目中,我们将系统地研究转录因子结合位点基元的计算挖掘方法。首先,提出一种基于受试者工作特征曲线下面积的新型判别方法,并结合新颖的搜索策略用于寻找两组序列间的高质量基元初始解。其次,将基元生成模型参数学习问题转化为对比训练过程,从而可以将观测数据和人工数据进行对比来最优学习模型参数。最后,将基元判别模型转化为多示例学习问题,从空间关系和高阶关系来建模其潜在的推断问题,并引入新的机器学习和优化计算工具。本项目的成功实施将有助于认识基因表达调控的内在机制,并帮助从系统层次来理解细胞活动以及解释疾病的发病机理。
{{i.achievement_title}}
数据更新时间:2023-05-31
DeoR家族转录因子PsrB调控黏质沙雷氏菌合成灵菌红素
跨社交网络用户对齐技术综述
内点最大化与冗余点控制的小型无人机遥感图像配准
转录组与代谢联合解析红花槭叶片中青素苷变化机制
基于多模态信息特征融合的犯罪预测算法研究
转录因子结合位点(TFBS)研究
有约束多项分布转录因子结合位点识别
原核生物转录因子结合位点的算法预测及应用
转录因子TDF1结合位点分析及直接调控下游基因鉴定