Supervised learning algorithms require the assumption of that training and testing datasets satisfy the same distribution, thus the classifier constructed on training dataset is capable of classifying the test dataset. However, in practical, the well trained classifiers may deal with the testing set that is distributed differently from the training set, in other words, the knowledge of the known fields (Source domain) has to be used to solve the unknown areas (Target domain), and this is the main purpose of transfer learning. Homogenous transfer learning and instance-based transfer learning algorithms have been studied intensively, however, more and more applications in our real life are producing complex and heterogeneous multi-source datasets. Due to the large differences in data distribution among multi-source domains, the features may be heterogeneous, which poses a big challenge for multi-source transfer learning. This project will start from the idea of ensemble learning and put forward a multi-source domain transfer learning model based on consensus solution to solve the following major problems: (1) For heterogeneous data distribution that may exist in multiple sources, we intend to use deep auto-encoder and mutual information to calculate the common feature space between source domain and target domain. Thus the information of the source domain can be well used. (2) By using the co-training algorithm, the reliability of sampling scheme is significantly improved during instance-transfer learning process. Thus the risk of negative transfer may be well reduced.(3) In association with feature engineering in deep learning, ensemble learning, co-training and the theory of information entropy, we construct a new fusion mechanism of multiple weak learners to further improve the robustness and accuracy of multi-source ensemble ransfer learning. As a summary, the project will put forward a more cutting-edge theoretical research on the multi-source transfer learning based on the ensemble learning, the research output will have a high theoretical and practical value.
迁移学习可以使用已知领域的知识解决不同领域的问题,通过已有的训练数据集建立的分类器可以处理与其训练数据集分布不同的测试数据集。而现实生活中越来越多的应用领域涉及到多源域数据迁移,且往往多个源域之间数据分布差异较大,如何合理并有效地利用多源域的可迁移信息成为了多源迁移学习研究的热点。本课题提出基于集成学习框架的多源域迁移学习模型,以解决以下主要问题:(1)针对多源域存在的异构数据分布,使用深度自动编码器结合互信息理论求解领域之间的共享特征空间表示。(2)引入协同训练方案,提高实例迁移中样本采集的可靠性,避免负迁移现象的产生。(3)通过集成学习思想结合信息熵理论,构造一种全新的融合函数,提高多源迁移学习的鲁棒性和精确度。综上所述本课题将在深度学习特征提取、集成学习、协同训练和互信息理论的基础上,对多源域迁移学习提出较为前沿的理论研究,其研究成果必将具有较高的理论和应用价值。
迁移学习打破了传统机器学习对训练数据和测试数据相同特征空间以及数据分布的假设,通过将一个领域中获得的知识有效迁移到其他不同但相似领域的学习任务中,以提高算法模型在新领域中的学习能力。然而,在实际应用中,复杂的数据分布制约了相关技术的进一步落地,如何消除领域差异,从而充分利用可迁移信息来建立高鲁棒性的迁移模型,成为迁移学习研究的热点。.本项目充分考虑源领域与目标领域在特征空间上的差异性以及分类器的可靠性和精确度的要求,构建了多个具有多视角、高鲁棒性、泛化能力强的集成迁移学习模型,取得以下成果:1)使用深度学习技术,从模型训练到特征表示,提出有效的迁移学习特征提取方法,构建了源域和目标域之间的优质共享特征空间;2)探索了协同训练机制在迁移学习中的有效切入点,可通过该机制对源域数据、目标域数据分别进行筛选来共同促进迁移模型的训练;3)研究了基于互信息、最大均值差异等源域和目标域之间的相似度度量方式,并以此建立合理的集成学习加权机制,构建了多特征、多源域等情况下的集成迁移学习模型;4)探究了集成迁移学习模型在实际应用中的可扩展性问题,在智慧医疗、智慧安防等领域进行了实际落地应用,有效推动智能计算方法的落地应用,提升云南省人工智能领域研发水平,对加速建设云南省科技强省进程、促进地区经济与社会发展具有重要战略意义。.综上所述,本项目结合迁移学习与集成学习,提出了一系列较为前沿的集成迁移学习基础研究理论,并将相关研究成果应用关于智慧医疗、智慧安防等领域,具有较高的理论和实用应用价值。
{{i.achievement_title}}
数据更新时间:2023-05-31
基于多模态信息特征融合的犯罪预测算法研究
基于公众情感倾向的主题公园评价研究——以哈尔滨市伏尔加庄园为例
惯性约束聚变内爆中基于多块结构网格的高效辐射扩散并行算法
基于协同表示的图嵌入鉴别分析在人脸识别中的应用
一种改进的多目标正余弦优化算法
基于多源域脑电信号特征矩阵的选择性深度迁移学习方法研究
多源异构数据中基于迁移学习的事件检测研究
多源异质用户下基于迁移学习的跨领域推荐研究
集成主动学习和众包技术的迁移学习算法研究