The project probes into building、multi-source heterogeneous data aggregation and semantic annotation of e-business domain ontology in order to the application of semantic reasoning in e-commerce. The domain ontology model of e-business is built by integrating of rough set, formal concept analysis and the theory of fuzzy sets, combined with original ontology model of the UNSPSC (United Nations Standard Products and Services Classification Code) by way of core ontology in order to enhance system robustness and antinoise. To reduce the time complexity and improve the accuracy and efficiency of ontology merging, the method of isomorphic generating of ontology merging based on rough concept lattices is presented. The framework of multi-source heterogeneous data aggregation is designed. The tables of domain high-frequency vocabulary are got by web crawler and mathematical statistics tool. The databases of deep web are annotated by high-frequency vocabulary table query in order to implementing heterogeneous data aggregation and improve the speed and accuracy of data annotation. The composite similarity among ontology concept is computing by introducing of trapezoidal fuzzy number to represent the fuzzy similarity and via weighted composite, non-fuzzy for the sake of improving the effect of mapping and matching. The filtering algorithm of theme crawler page based on ontology is improved in order to reduce the noise and enhance the speed of page collection. The structure of semantic annotation is devised. The removing repetition of URLs by the methods of block hash function is presented in order to achieve a balance between high space efficiency and false positive rate in the phase of web pages crawled. To improve classification efficiency, the classification algorithm based on multi-core and multi-classification of support vector machine is presented in order to overcome the limitations of the classification data.
项目从语义推理在电子商务中的应用出发,对电子商务领域本体构建、多源异构数据聚合、语义标注方法等关键技术进行研究,为了增强系统的鲁棒性和抗噪能力,运用粗糙集、形式概念分析和模糊集理论,以UNSPC为核心本体,建立电子商务领域本体;为降低时间复杂度,提高本体合并准确率,提出粗糙概念格同构生成的本体合并方法;设计多源异构数据聚合框架,为提高数据标注速度和准确率,借助网络爬虫和数理统计工具,获取领域高频词汇表,通过查询高频词汇表标注Deep Web数据,实现异构数据聚合;引入梯形模糊数表示模糊相似度,通过加权综合、非模糊化,计算概念间的复合相似度,提高映射和匹配效率;为降低噪声提升页面获取速度,提出基于本体的主题爬虫页面筛选算法;设计语义标注框架,抓取网页时,为平衡高空间效率和误判率,提出利用分块哈希函数法进行URLs去重,为克服分类数据局限性,提高分类效率,提出基于多核多分类支持向量机的分类算
项目以语义推理在电子商务个性化推荐系统中的应用为背景,对电子商务领域本体的构建、多源异构数据聚合、语义标注方法等关键技术进行研究。为提升搜索引擎查询性能,提高电子商务推荐系统中个性化服务质量,提高自动代理健壮度和智能水平奠定理论基础。研究内容和重要结果如下:.1. 粗概念格模型的构建方法:(1)给出了粗概念格模型的描述,建立粗糙集和概念格之间的关系。(2)结合变精度粗糙集模型,讨论了近似分类质量γ与参数β的关系,提出了阈值β选取算法。(3)在此基础上,采用改进的可辨识矩阵属性约简算法对形势背景约简,减少了信息量,提高了容错性。(4)提出了基于变精度粗集的概念格构造方法,通过实验表明该方法建格速度快、节点数目大大减少、系统抗噪能力增强。.2. 领域本体的构建方法:(1)利用粗概念格模型,以UNSPC(联合国标准产品与服务分类代码)为核心本体建立本体元,增强了系统的鲁棒性和抗噪能力。(2)为降低时间复杂度,提高本体合并的准确率,提出了粗糙概念格同构生成的本体合并方法。(3)引入梯形模糊数表示模糊相似度,通过加权综合、非模糊化计算异构本体间的复合形似度,提高了映射和匹配效率。.3. 多源异构数据源聚合框架设计:(1)封装器对分布在网络中的结构异构的数据进行粗粒度处理,将其转换成统一结构模式,屏蔽数据源结构异构性;(2)网络爬虫从网页上聚焦抓取领域网页词汇,获得高频词汇表;(3)逐条提取高频领域词汇表中的词汇,对查询结果通过本体映射规则对Deep Web数据进行标注。.4. 基于领域本体的语义标注方法:(1)设计分布式爬虫框架,借助基于本体的概念相似度计算,预测链接的主题相关度和判断网页内容的主题相关度,提高了爬虫查准率。(2)构建了一个语义标注框架,提出了基于多本体语义相似度的实体和概念匹配方法,该方法借助领域本体消除语义异构。通过实验对比分析,该方法能够有效消除语义异构、提高语义标注准确率,为提升搜索引擎查询性能奠定了理论基础。
{{i.achievement_title}}
数据更新时间:2023-05-31
论大数据环境对情报学发展的影响
基于公众情感倾向的主题公园评价研究——以哈尔滨市伏尔加庄园为例
中外学术论文与期刊的宏观差距分析及改进建议
资源型地区产业结构调整对水资源利用效率影响的实证分析—来自中国10个资源型省份的经验证据
多源数据驱动CNN-GRU模型的公交客流量分类预测
基于时空领域本体的语义标注技术研究
基于语义网的微生物多源异构数据整合关键技术研究
面向湿地自然保护区水鸟监测的多源异构数据标注
面向异构Web信息源的语义知识获取和融合关键技术研究