Learning to rank is a central problem of information retrieval, machine learning and data mining, which takes an important role in the area of search engine. Current related research works focus on relevance learning to rank, often ignore the credibility of web information. The approaches of learning to rank based on effectiveness are less efficient in dealing with big data, we must seek some efficient parallel learning to rank approaches to adapt to the big data environment. Targeting at the above-mentioned observations, this project will study the approaches and its parallelization of the credibility learning to rank to solve the problems of credibility and efficiency of the credibility learning to rank for big data by using different theories and methods comprehensively such as big data processing, web spam detection, granular computing, multi-objective intelligent optimization and multiple attribute decision making. Main contents include: 1) extracting and measuring these ranking features of relevance, credibility and incredibility, and constructing a big data of the credibility learning to rank, and studying clustering algorithm of queries based on granular computing; 2) studying the approaches of the credibility learning to rank for big data based on multi-objective optimization model of the credibility leaning to rank and multi-objective intelligent optimization algorithms; 3) parallelizing the proposed approaches of the credibility learning to rank for big data in the framework of Spark. The research results can provide a new model and new approaches for learning to rank, and provide new ideas for the research on the credibility of ranking results and the efficiency of learning to rank for big data, and can be applied in search engines.
排序学习是信息检索、机器学习和数据挖掘的一个中心问题,它在搜索引擎中占有重要地位。现有相关工作重在相关性排序学习,往往忽略了网页信息的可信性。单纯以效果为中心的排序学习方法在处理大数据时效率较低,须寻求适应大数据环境的高效并行排序学习方法。本项目拟综合应用大数据处理、web spam检测、粒计算、多目标智能优化和多属性决策等理论与方法,研究大数据可信排序学习方法及其并行化,解决大数据可信排序学习的可信和效率问题。具体内容包括:1)提取和度量相关性、可信性和不可信性排序特征,构建可信排序学习大数据,研究基于粒计算的查询聚类算法;2)以可信排序学习多目标优化模型和多目标智能优化算法为基础,研究大数据可信排序学习方法;3)在Spark框架下,研究2)中方法的并行化问题。研究成果可为排序学习提供新模型和新方法,为大数据排序结果的可信性和排序学习效率的研究提供新思路,并能在搜索引擎中得到应用。
排序学习是信息检索和机器学习领域交叉的一个研究热点,它在搜索引擎和推荐系统中占有重要地位。本项目基于多目标智能优化算法等技术,探究了大数据可信排序学习方法及其并行化,以增强排序模型的可信性和排序学习的效率。详细综述了信息检索与机器学习中排序学习以及大数据中大规模图计算系统的研究进展,构建了排序学习的多目标优化模型,基于偏差-方差均衡理论,提出了一种基于多目标粒子群优化的鲁棒性排序学习方法,基于马太效应思想和学习率的变化策略改进了LambdaMART排序学习方法,改进了一种带拥挤距离的多目标粒子群优化算法并设计了大数据环境下的基于改进的带拥挤距离的多目标粒子群优化算法的可信排序学习方法及其并行方法,设计了归档式多目标模拟退火算法的并行化并基于此设计了一种基于Spark和归档式多目标模拟退火算法的大数据可信并行排序学习方法,提出了一种融合多头自注意力机制和条件生成对抗网络的排序学习方法,开发了基于多目标粒子群优化的排序学习系统和基于Hooke & Jeeves模式搜索的排序学习系统。本项目的研究为排序学习提供了新模型和新方法,为大数据排序学习的可信性和效率的研究提供了新思路。
{{i.achievement_title}}
数据更新时间:2023-05-31
基于公众情感倾向的主题公园评价研究——以哈尔滨市伏尔加庄园为例
惯性约束聚变内爆中基于多块结构网格的高效辐射扩散并行算法
物联网中区块链技术的应用与挑战
基于协同表示的图嵌入鉴别分析在人脸识别中的应用
一种改进的多目标正余弦优化算法
组排序学习方法的研究与应用
海量高维天体光谱数据挖掘及其并行化研究
基于GPU的并行排序算法设计与优化
多核机群系统上并行排序和选择算法研究