High throughput screening is the most widely-used tool in current drug screening, where image analysis based on feature selection plays a prerequisite role. Most current feature selection methods strongly depend on the prior of high throughput biological experiments, and thus lack of generalization ability. As a hot topic in deep learning, convolutional neural networks (CNN) is able to extract the image feature adaptively from the original images and their corresponding labels, thus it can be considered as a generalized option in solving the problem. For a better consideration of real-world application, combined with sparse representation theory, we proposes pretraining CNN, in order to provide a generalized, accuracy and robust feature extraction method. The details are: for the inaccuracy kernels estimation due to insufficient labeled image samples and the high computational cost from back-propagation, the proposed model introduces pretraining to CNN, which trains the convolutional kernels based on addressing a problem of fast image reconstruction. Since the image data from high throughput experiments is large scale and updated frequently, the model introduces double sparisity dictionary learning in pretraining procedure, which enhances the performances of feature selection, while reduces the computational cost and storage requirement simultaneously. Nonparametric Bayesian model is introduced to interpret the previous double sparsity prior again, in order to adaptively infer the key parameters in the pretraining procedure, so the requirement of manual intervention in feature extraction can be reduced, and the robustness of feature extraction can be enhanced for a better real-world application.
高通量筛选是现代药物开发的最主流手段,其中基于图像特征提取的显微图像分析至关重要。目前的图像特征提取方法主要依赖于实验先验知识,普适性较差。卷积神经网络(CNN)可由大量图像自适应提取图像特征,在此问题上优势明显。项目根据高通量筛选特点,提出卷积核预训练CNN模型,并结合稀疏表示理论对预训练策略开展研究,使图像特征提取方案兼具普适、准确和高效性能。具体如下:在CNN中引入卷积核预训练环节,基于图像最优重建对卷积核进行预学习,以解决高通量筛选应用中有标图像不足无法准确估计网络参数,和卷积核训练算法速度慢的问题;将卷积核预训练归结为反问题求解过程,并引入双稀疏先验字典学习,在提升图像特征提取性能的同时降低计算存储成本,以适应高通量筛选数据量大且频繁更新的特点;利用非参数贝叶斯稀疏表示框架重新定义双稀疏先验,自适应获取预训练过程中各项关键参数,进一步减轻特征提取中的人为干预,更符合实际应用需求。
高通量筛选是现代药物研发中最主流的手段。如何能够对实验产生的大量细胞级或者亚细胞级的显微图像进行准确而快速且地分析,筛选出制药的有效化合物是高通量筛选的关键问题所在。目前国内外在高通量筛选显微图像分类和检索取得一定进展,图像特征提取都是影响其性能关键。本课题针对这一应用背景,分析高通量筛选的数据特点和应用需求,开展基于机器学习的高通量筛选显微图像分析研究,以服务于药物研发。研究内容包括:1)基于卷积神经网络的显微图像特征提取研究,由大量图像自适应提取图像特征,以解决药物筛选显微图像分类任务中强烈依赖专业人员手工定义特征的问题;2)基于稀疏表示和卷积神经网络的显微图像特征提取研究,在CNN中引入卷积核预训练环节,基于图像最优重建对卷积核进行预学习,以解决高通量筛选应用中有标图像不足无法准确估计网络参数,和卷积核训练算法速度慢的问题;3)基于迁移学习中的域适应理论,开展显微图像特征提取研究,解决实际应用中标签代价较高导致的标签不足小样本问题。由于医学病理图像与药物筛选图像具有较强相似性,课题还利用所提出方法延伸用于病理图像分析研究,解决其实际应用中存在的小样本少标签的问题。
{{i.achievement_title}}
数据更新时间:2023-05-31
内点最大化与冗余点控制的小型无人机遥感图像配准
基于细粒度词表示的命名实体识别研究
基于协同表示的图嵌入鉴别分析在人脸识别中的应用
基于图卷积网络的归纳式微博谣言检测新方法
多源数据驱动CNN-GRU模型的公交客流量分类预测
基于稀疏表示分类的高内涵筛选显微图像神经突骨架提取研究
基于非线性核稀疏表示的医学图像特征提取方法
基于稀疏表示理论的高光谱遥感图像的特征提取与分类
基于区域卷积神经网络稀疏正则模型的图像除雾理论与方法研究