Nowadays, the rise of social networking service as a new medium of information dissemination has endowed many social networks with enormous commercial value; in the meanwhile, the graph structure of social networks makes graph pattern matching a key technique for social network analysis, e.g., identifying social communities and social positions, finding experts according to social relationships and so on. However, social networks bring graph pattern matching new challenges. As real-life social networks are typically large, it is often prohibitively expensive to conduct graph pattern matching over such large graphs, e.g., NP-complete for subgraph isomorphism, and quadratic time for graph simulation. In response to these challenges, we study the graph pattern matching problem based on real-life applications, and develop techniques which are particularly applicable for large, frequently updated and distributively stored social networks, under a unified logical framework. More specifically, (1) we will revise graph pattern matching defined in terms of graph simulation, by supporting designated output nodes; with the revise, we require the query results including only matches of the output nodes and develop algorithms to find diversified top-k matches; in the meanwhile, we will study resource-bounded query answering via a dynamic scheme that reduces big G to a much smaller graph G' and achieves high query accuracy by using G' instead of G; (2) we also study the problem of compressing graphs into hierarchical structures and develop efficient maintenance techniques for compressed graphs; with these, we will explore solutions to speed up graph pattern matching computation by using materialized views and compressed graphs for subgraph isomorphism and graph simulation, respectively; (3) we next develop effective algorithms to cope with distributed social networks. The algorithms will possess low site visit times, response time and data shipment, and better still, can be evaluated more efficiently by parallel computation; and (4) we will also experimentally verify the efficiency, effectiveness and scalability of all our centralized and distributed algorithms; moreover, all the techniques will be integrated to one prototype system. In short, our study is to establish a complete and systematic theory for graph pattern matching, and will be of momentous theoretical significance and practical value to the analysis of big social data.
社交网络已成为新的信息传播载体,具有极高的商业应用价值;与此同时,图结构特点使得图匹配查询成为社交网络分析的关键技术,如:社交圈子发现、角色分析、专家推荐等。然而,庞大的社交网络和昂贵的图匹配计算,制约了图匹配查询的应用。面对上述挑战,本课题在统一的逻辑框架下,从应用需求出发,针对社交网络数据规模大,查询结果复杂,数据更新频繁,分布式存储等特点,系统性地开展图匹配查询研究。主要创新点包括:1)在top-k多样性计算和有限资源近似计算两个方向探索近似图匹配查询;2)探索层级图压缩及压缩图增量维护技术,实现通过视图、压缩图优化图匹配查询;3)克服分布式计算的复杂性并发挥并行计算的优势,探索有效的分布式图匹配查询技术;4)实验验证上述技术的有效性并进行整合,完善原型系统。本课题的研究有利于建立系统完整的面向“大数据”的图匹配查询技术,对“大数据”环境下社交网络分析具有重要的理论意义和应用价值。
图结构匹配是一个经典的研究问题,在多个领域内有着广泛的应用。近年来,随着社交网络的兴起,图结构匹配进一步地成为社交网络分析的关键技术之一,被应用于社交圈子发现、角色分析、目标推荐等。然而,由于社交网络数据量十分庞大,且图结构匹配问题的计算复杂度非常高,例如,子图同构是NP完全问题,不存在多项式时间复杂度的算法;图仿真算法的时间复杂度也是输入图G的平方,因此在大规模图数据上进行图结构匹配运算有着很高的开销,难以满足实际应用需求。为此,如何有效地提高图结构匹配运算的效率是当前社交网络分析研究的热点。本课题着眼于这一现实问题,针对社交网络数据“规模大”,“查询结果复杂”,“分布式存储”等特点,系统性地开展图匹配查询研究。课题组经过三年多的努力,在多样化图结构匹配,基于视图的图结构匹配,分布式图结构匹配,有限资源图结构匹配等问题上进行了深入探索,取得了一系列的研究成果,具体包括:探索了top-k多样化查询技术,设计了具有“提前终止”性能的算法,使得查询更有针对性,且计算更加高效;设计了基于视图的查询技术,大大提高了图结构匹配运算的效率;探索了分布式环境下图结构匹配的问题,开发了可并行运行的分布式算法,极大地提高了运算效率;提出了“有限资源查询计算”的框架,设计了有限资源查询算法,实现了在有限资源环境下高效率的查询计算。以上这些成果发表在了国际高水平的期刊和学术会议,如:TKDE (CCF A类)1篇,VLDB (CCF A类)1篇,EDBT(CCF B类) 1篇, DASFAA(CCF B类)1篇,DEXA(CCF C类) 1篇,产生了一定的学术影响力。不仅如此,上述成果还得到了企业,如华为,长虹等的应用,帮助他们取得了良好的经济效益;同时也申请国内外发明专利6项,其中美国发明专利1项(已获授权),国内发明专利5项。同时在本项目研究成果的基础上,课题组负责人作为主要参与人获批国家自然科学基金重大项目1项,作为主持人获批软件开发环境国家重点实验室开放课题1项及多项纵横向科研项目。
{{i.achievement_title}}
数据更新时间:2023-05-31
论大数据环境对情报学发展的影响
跨社交网络用户对齐技术综述
城市轨道交通车站火灾情况下客流疏散能力评价
基于ESO的DGVSCMG双框架伺服系统不匹配 扰动抑制
基于协同表示的图嵌入鉴别分析在人脸识别中的应用
基于Spark的大图数据最优子模式匹配查询方法研究
面向大规模图数据的高效结构查询技术研究
面向安全图数据外包服务的可查询加密机制研究
移动社交网络中面向多维隐私保护的安全查询方法研究