Nowadays, parallel read/write imbalance and semantic incompatibility between the I/O layer and the storage layer are common in the distributed mass storage system oriented to the big data application, which, together with the problems in terms of the system’s reliability and scalability, reduces the efficiency in the big data application. To solve these problems, the following researches are to be carried out in this project:. (1) in order to optimize the reading performance of the distributed mass storage system, with the help of the bipartite graph and the ford-fulkerson algorithm, an original matching algorithm based on the max-flow is to be designed between the data process and the read data block ; . (2) in order to improve the writing imbalance of the distributed mass storage system, an innovative LRU writing algorithm is to be designed according to the heat map that reflects the load of each node in real time; . (3) researches into a new group layout method for the data in the distributed mass storage system is to be carried out, to reduce the data restructuring costs in case of the node’s failure or the new node’s joining in the system and to ensure the reliability and scalability of the system.. (4) a semantic MapReduce framework is to be researched, achieving the semantic integration between the I/O scheduling and the storage strategy of the distributed mass storage system, with an increased efficiency in the big data application based on a cloud computing platform.. Therefore, based on the above-mentioned, not only will the project provide an innovative perspective and method for the design of the distributed mass storage system oriented to the big data application, but also a reference for planning and designing big data centers, which will of tremendous significance and value both in theory and practice in this field.
面向大数据应用的分布式海量存储系统存在并行读/写不平衡、不同数据层语义不兼容,以及系统可靠性和扩展性问题,降低了大数据的利用效率。因此,本项目研究内容:(1)借助二分图和Ford-Fulkerson算法,设计一种基于最大数据流的应用进程和被读数据块的匹配算法,优化海量存储系统的读性能;(2)设计一种实时反映各节点负载的热图和基于LRU的写数据算法,改善海量存储系统的写不平衡性;(3)研究一种新型位移分布数据分组布局方法,减少海量存储系统节点失效和新节点加入的数据重组代价,保证系统的可靠性和扩展性;(4)研究一种新的语义MapReduce框架,实现分布式海量存储系统的I/O调度策略和存储策略的有效融合,提高基于云计算平台的大数据应用效率。该项目的研究成果不仅能为设计面向大数据应用的海量存储系统提供一种新的视角和方法,也能为大型数据中心的规划和设计提供参考方案,具有重要的理论意义和应用价值。
在本项目的资助下,项目组从提高面向大数据应用的分布式海量存储系统的性能出发,对影响海量存储系统性能的若干关键技术问题(并行读/写不平衡、系统可靠性和扩展性等)进行了深入研究,同时对物联网海量信息存储安全关键问题进行了研究,并拓展对基于深度学习的网络数据安全理论和人工智能方面的视觉问答理论进行了研究。发表SCI期刊论文36篇、国际会议论文3篇。其中:IEEE期刊论文18篇,CCF A类英文期刊论文1篇, CCF B类英文期刊论文2篇,中科院1区论文6篇,高被引论文3篇;申请发明专利62项,其中:授权国家发明专利33项,授权国际发明专利1项,实审状态PCT国际发明专利4项;培养博士生7人、硕士生17人,培养青年教师2人,参加国际和国内学术会议9次,相关成果获上海市科技进步二等奖两项,获上海市浦东新区科技进步一等奖一项,获中国航海学会、中国港口协会科技进步一等奖各一项。
{{i.achievement_title}}
数据更新时间:2023-05-31
论大数据环境对情报学发展的影响
小跨高比钢板- 混凝土组合连梁抗剪承载力计算方法研究
基于多模态信息特征融合的犯罪预测算法研究
基于公众情感倾向的主题公园评价研究——以哈尔滨市伏尔加庄园为例
圆柏大痣小蜂雌成虫触角、下颚须及产卵器感器超微结构观察
面向NVST海量高速观测数据的分布式存储系统研究与实现
面向分布式存储系统的数据快速修复纠删码关键技术研究
云存储系统中海量时效数据的组织模式及关键技术研究
面向FAST的海量数据处理关键技术研究