The accurate prediction of protein subcellular locations plays a critical role for understanding specific functions of human proteins. Currently, most prediction models of protein subcellular localization are based on amino acid sequence. However, sequence-based analysis by itself is not sensitive enough for detection of protein translocation as translocation can be strongly effected by mutations outside the target sequence. For example, mutations in nucleoporin complexes can have dramatic effects on the nuclear localization of multiple other proteins. Basically, sensitively detecting translocated or mislocated proteins in human cancers tissues is more relevant. Recent research has shown that some of translocated or mislocated proteins making impossible for finding their correct interaction molecular partners, and finally affecting the entire molecular biology network. We call this type of protein known as potential cancer biomarker. Focus on translocation or mislocation of these potential cancer biomarkers can effectively improve the accuracy of cancer early warning, and can provide a valuable scientific basis for the molecular target therapy and prognosis work..Based on the motivations mentioned above, a growing number of researchers and institutions move their research data source from amino acid sequence to more intuitive image data source in protein subcellular location prediction, and devote their efforts to the development of bio-image-based classification systems. In recent years, with the rapid development of high resolution imaging technology, it is easier to get high-resolution protein image signal, which means protein subcellular location patterns in normal and cancer human tissues can be more intuitively observed. This significant progress provides high quality data sources for constructing data-driven automated protein subcellular location prediction model. Study on image-based prediction models can not only predict accurately and effectively human protein subcellular localization in normal tissue and cancer tissue, but also provide sensitive in capturing translocated or mislocated proteins in human cancers tissues to screen potential cancer biomarkers. Also, it is crucial important for clinical diagnosis and pharmaceutical engineering..Taking the human protein atlas as the research object, this project focuses on solving key issues in the field of prediction model design of image-based protein subcellular localization, and the main content includes as follows..(1) To develop a novel preprocessing approach for IHC images by using generalized total variation model in the field of image-based protein subcellular localization..(2) To develop a novel local feature descriptor for IHC image, which both taking the general robustness of local micro-pattern extraction and high efficiency of statistical quantization characteristic into account. .(3) To develop a series of algorithms of image-based multi-label human protein subcellular localization prediction by combining prior knowledge of label-dependency with multi-kernel prediction algorithm..(4) A new semi-supervised protocol, which can take advantage of medium stain level and cancer IHC images in model construction phase by an iterative and incremental training strategy can be proposed and applied to deal with large-scale and multi-label human protein subcellular localization prediction. The proposed protocol can fundamentally solve the bottleneck problem in the field of sensitive detection of human protein translocation as translocation..(5) The application of proposed image-based multi-label human protein subcellular localization prediction model, such as screening cancer biomarkers.
准确预测蛋白质亚细胞位置对于揭示蛋白质功能有极其重要的参考价值。领域内的预测模型大多基于蛋白质序列层面,但序列信息对于癌变所导致的亚细胞位置转移并不敏感。近年备受国际社会关注的基于图像信号的亚细胞定位预测很好地解决了该瓶颈问题。本项目以人类蛋白质图谱库为研究对象,重点研究解决蛋白质图像亚细胞定位预测模型设计的关键问题,建立一套高精度预测的先进理论新方法。具体研究内容如下:(1)研究设计更适合IHC图像的广义全变差空间模型对蛋白质通道图像信号预处理。(2)研发兼具蛋白质图像信号局部特征共性描述鲁棒性及其特性量化高效性的局部特征算子。(3)合理利用基准数据集的有效先验知识和预测算法的多核新思路,设计出一套面向多标记蛋白质IHC图像的预测算法体系。(4) 突破静态训练导致模型更新困难的瓶颈问题,设计出面向大规模、多标记蛋白质图像亚细胞定位动态更新的预测模型。(5) 应用预测模型筛选癌症标记物。
蛋白质需要在正确的时间出现在正确的亚细胞位置,并与相应的分子结合才能正确行使它们的功能。准确的获取到蛋白质亚细胞位置对于理解蛋白质的功能、癌症靶向药研发以及癌症标志物的筛选方面起着不可替代的作用。尽管图像信号源很好地解决了序列信息对于癌变所导致的亚细胞位置转移并不敏感的瓶颈问题,但领域内的研究工作任受限于两大根本问题:①基于蛋白质图像信号空域描述子特性表述不充分、可解释性弱;②解析和量化描述图像信号时采用的方式大多集中在挖掘图像信号的浅层特征,而基于深度学习框架下的抽象特征缺乏可解释性;③由于IHC图像的复杂性,简单移植CNN框架来处理蛋白质图像信号的分类问题是不可行的。. 基于以上研究动机,本项目主要研究内容分为以下两个方面:(1)构建基于单演信号解析的蛋白质图像亚细胞定位预测模型MIC_Locator,解决了蛋白质图像信号特性表述不充分、可解释性弱的瓶颈问题。首先,通过傅里叶变换将IHC图像信号进行频域转换,再利用Riesz变换得到相应的单演信号;然后,利用不同频率尺度的Log-Gabor滤波器,捕获频域内振幅A、相位P和方向O分量信息;引入适用于IHC图像相APO分量的编码策略,同时引入图像强度编码策略进行互补性量化;最后,在链式分类器架构下,采用决策层融合不同尺度及不同频率分量对应的方式完成预测模型的构建,预测精度可达60.56%。(2)首次提出了基于残差网络和注意力机制的AR_Locator预测模型。首先,采用线性谱分离的方式将蛋白质通道从原始IHC图像中分离出来,再通过多尺度Candy算子获取原始IHC图像中的有效区域;其次,采用Resnet50作为主干网络和嵌入多个注意力机制模块建立AR_Network,采用端到端的训练方式完成模型的参数优化过程。同时,分别从注意力机制模块2和GAP层中提取深度特征。最后,结合深度网络特征和传统浅层特征,在决策层融合基于深度特征和浅层特征所训练的预测模型,建立最终的预测模型AR_Locator。实验结果表明, AR_Locator预测精度高达72.09%的预测精度,显著优于领域内其他已有预测模型。
{{i.achievement_title}}
数据更新时间:2023-05-31
涡度相关技术及其在陆地生态系统通量研究中的应用
祁连山天涝池流域不同植被群落枯落物持水能力及时间动态变化
一种光、电驱动的生物炭/硬脂酸复合相变材料的制备及其性能
粗颗粒土的静止土压力系数非线性分析与计算方法
基于LASSO-SVMR模型城市生活需水量的预测
基于分子显微图像复杂模式理解的蛋白质亚细胞定位及位置动态转移检测研究
基于多标记学习的蛋白质亚细胞多位置预测方法研究
Web图像视觉模式挖掘及其应用
相关分析理论及其模式识别应用的几个关键问题研究