Learning from unlabeled data stream is a hot topic, this is because it is difficult to obtain the labels of data streams in the real-world applications. Recently, semi-supervised learning has been used to handle unlabeled data streams. However, these approaches are built on the assumption that both of the labeled and unlabeled distributions are independent and identical. It is obviously not supported in the real-world applications. Thus, transfer learning, which aims to learn from the unlabeled data with the help of some labeled data, is proposed to tackle the unlabled data streams. In this proposal, we focus on the key issues of transfer learning on unlabled data streams. More specifically, we first study the adaptation of transfer learning theory and method in data streams, and explore the model of representation and design on issues of the transfer subjects and the transfer bridges, which are real-time in view of the streaming environment. Secondly, we study the effective transfer learning methods for unlabeled data with the help of labeled data regarding the instance,feature, model, etc. Meanwhile we will focuse on the method and technique of the label propagation. In addition, regarding the concept drifts in data streams, we study effective methods of concept drifting detection and the adaptation mechanism of classifiers for unlabeled data streams. Lastly, we aim to construct the knowledge transfer approaches of unlabeled data stream without the restriction of independent and identical distributions. Based on all mentioned above, we apply our methods in the hanlding of text streams such as the reviews of products on the web, and design the prototype system of classification for one or multiple unlabeled data streams.
实际应用中标记信息的难以获取使得未标记数据流的研究成为热点。目前已有研究将半监督方法用于不完全标记数据流,然而这种方法基于标记数据与未标记数据独立同分布的假设,这在实际应用中难以满足。为此本课题将迁移学习引入未标记数据流中,围绕其中的关键问题展开研究。首先对迁移学习理论和方法体系在数据流环境下的适应性问题开展研究,探讨适应流环境的实时、快速的迁移主体和迁移桥梁的模型表示和设计方法;基于实例、特征、模型等数据形态,研究如何有效的将标记数据迁移到未标记数据的学习过程中,重点研究标记信息的传播和扩散机制;此外,针对未标记数据流中的概念漂移问题,开展有效的概念漂移检测方法和相应的分类器适应机制,最终形成不受独立同分布条件限制的,未标记数据流的有效知识迁移体系和方法。在上述工作基础上,以web评论数据流为应用背景,构建未标记数据流的分类原型系统。
本项目研究基本按计划执行,围绕基于数据流中的迁移学习问题及相关应用问题,根据预定的技术路线开展研究。首先,针对在线文本数据的一般特点,如多标记,短文本和质量不高等,研究了有效的相应处理方法方法。此外,针对迁移过程中数据块之间分布差异的不同形式,设计适用于在线数据流环境的迁移学习模型与算法,有效的避免负迁移现象,在实际领域数据集上验证了其对精度的提升。最后,研究了数据流下概念漂移检测与迁移学习相结合的在线学习框架,提高了在线学习的效率。在此基础上,尝试将研究中所设计的模型与算法在实际数据流领域进行应用研究。经过三年的研究,项目执行情况良好,取得了丰硕的阶段性成果。
{{i.achievement_title}}
数据更新时间:2023-05-31
论大数据环境对情报学发展的影响
温和条件下柱前标记-高效液相色谱-质谱法测定枸杞多糖中单糖组成
基于公众情感倾向的主题公园评价研究——以哈尔滨市伏尔加庄园为例
基于协同表示的图嵌入鉴别分析在人脸识别中的应用
一种改进的多目标正余弦优化算法
数据流半监督分类中的半监督迁移学习研究
面向正样本和未标记样本学习的算法研究及其应用
基于多潜在空间的迁移学习关键问题研究
未饱和含湿多孔介质中热质迁移的研究