In order to effectively analyze and discover useful knowledge from the widely available large-scale healthcare data using latent subspace models designed for pattern discovery, this project aims to address some key basic issues of latent subspace models in the aspects of diversity regularization, learning and inference and scalable algorithms by making use of advanced machine learning and optimization techniques. First, we investigate how to apply diversity regularization on a high dimensional kernel latent subspace models and structured latent subspace models with multi-view and grouping structures, to accomplish long-tail coverage, low complexity and interpretability. Second, in order to quantify uncertainties which are not available in frequentist-style regulariztion latent subspace models, we propose the diversity inducing Bayesian priors in Bayesian latent subspace models and propose to use the truncated variational inference and MCMC sampling techniques for approximate inference in both parametric and nonparametric diversity regularized Bayesian latent subspace models. Third, we propose to develop scalable, parallel optimization algorithms as well as methods of moments that can scale diversity regularized latent subspace models to tens of millions of data samples and large scale models with tens of millions of parameters, to significantly improve the computational efficiency and statistical consistency. Finally, to manipulate the massive complex unstructured healthcare big data, we apply scalable diversity regularized latent subspace models to compuational phenotyping, patient similarity computation, health status forecasting and other applications, with an ultimate goal of effectively promoting the development of new technologies in the field of machine learning and all levels of applications in healthcare analytics.
本项目以医疗大数据的分析与知识挖掘为研究背景,充分利用统计机器学习、优化方法等领域最新研究成果,解决隐层空间模型多样性正则化及高效快速算法。创新点为:(1)提出基于核变换和多视图分组结构隐层空间模型的多样性正则化方法。发现长尾语义及多样化的隐含成份,显著提高模型可解释性并降低模型复杂度;(2)针对频率派正则化隐层空间模型难以量化不确定性等问题,提出贝叶斯隐层空间模型的互夹角偏置先验的多样性正则化、截断变分推理和MCMC采样等近似推理方法,有效解决参数化和非参数化贝叶斯模型的多样性正则化与后验推理等问题;(3)提出多样性隐层空间模型的可扩展并行优化算法及用于参数估计的矩方法,显著提高模型的运行效率且满足统计上的一致性;(4)面向海量复杂非结构化医疗健康大数据,将多样性隐层空间模型用于计算表型、个体相似度、健康状态预测等应用,有效推动机器学习领域内新技术的发展和在医疗健康领域各层次的应用。
针对复杂医疗及生物数据的隐含结构挖掘及聚类预测问题,系统而深入地研究了隐层空间模型的模型表示等问题。包括:(1)针对大规模基因序列数据的聚类问题,提出基于局部敏感哈希(LSH)和非参数化贝叶斯方法(DP-means)的高效聚类方法,是目前生物信息领域处理大规模聚类问题最高效和高准确性的方法之一。(2)提出一种基于分层贝叶斯隐层空间模型的微生物关联网络预测方法,可以有效地处理因为组成成分偏差和测序数据自身的方差所带来的关联推断的准确性问题,同时考虑微生物和环境因素的影响,显著提高在微生物关联和微生物与环境因素关联的预测任务中的准确性和实用性。(3)为了解决深度神经网络处理视频数据时存在的传递性差效率低等问题,本项目利用一种生成机制来获取对抗性图像和视频,将高层次的类丢失和低层次的特征丢失结合起来,共同训练对抗性实例生成器。(4)本项目提出基于注意力机制的深度学习框架,解决数据驱动和循证的急危重病人分类等。在全国心脏大会2017上做特邀报告。获中国计算机学会(CCF)自然科学一等奖(排名第三)2017。
{{i.achievement_title}}
数据更新时间:2023-05-31
玉米叶向值的全基因组关联分析
涡度相关技术及其在陆地生态系统通量研究中的应用
论大数据环境对情报学发展的影响
正交异性钢桥面板纵肋-面板疲劳开裂的CFRP加固研究
特斯拉涡轮机运行性能研究综述
医疗与健康的数据分析与决策
医疗与健康的数据分析与决策
大数据驱动的智慧医疗健康管理创新
面向医疗健康大数据的半结构化数据管理关键技术研究