Large-scale distributed storage systems need to introduce data redundancy to ensure the system fault-tolerance. For the same level of redundancy, erasure coding techniques can greatly improve the data reliability when compared to the replication scheme. However, traditional erasure codes such as Reed-Solomon codes usually require large quantities of network resources (e.g., disk I/O and network bandwidth) during the repair of a failed storage node. Therefore, regenerating codes are proposed to tackle the problem of repair bandwidth, yet each helper node participating in the repair process needs to perform a large number of linear operations on the data it stored. Motivated by this, a special family of regenerating codes called fractional repetition (FR) codes are introduced with the capability of recovering a failed node by simple data transfer, suggesting that both the repair bandwidth and computational complexity are optimal. Although some explicit constructions of FR codes are presented in the literature, the performance of FR codes needs to be further explored. This project focuses on FR codes and consists of the following research topics: 1) Optimize the performance bounds of FR codes, including the upper bound on the supported file size and the lower bound on the data reconstruction degree; 2) Present optimal code constructions based on combinatorial designs and the graph theory that attain the theoretical bounds; 3) Deploy the constructed FR codes in Ceph distributed file system and verify the storage performance. The proposed project will advance the development of FR codes for distributed storage systems.
大规模分布式存储系统需要引入数据冗余来保证系统的容错性。与复制技术相比,纠删码能够在相同冗余信息的情况下提高数据的可靠性。然而,传统纠删码(如Reed-Solomon码)的节点修复过程需要消耗大量的资源,如磁盘I/O、网络带宽等。为此国际上提出了再生码以解决节点修复带宽问题,但参与修复的节点需要进行大量的线性运算。作为再生码的一种特例,部分重复码的节点修复过程只涉及简单的数据传输,因此修复带宽和修复复杂度均是最优的。尽管现有的工作给出了一些部分重复码构造方法,但是码字的性质研究还有待进一步完善。本课题拟围绕部分重复码开展以下几方面研究:1)优化部分重复码的性能界限,包括支撑文件大小上界和文件重构度下界;2)研究基于组合设计和图论的最优码字构造方法,能够达到上述理论界限;3)在Ceph分布式文件系统中部署所构造的码字,验证分析其整体性能。本课题的研究将对部分重复码的发展起到重要推动作用。
部分重复(Fractional Repetition, FR)码具有编解码简单、节点修复快等特性,因此在实际分布式存储系统中具有广泛的应用前景。本项目针对部分重复码的关键性能展开研究,主要优化了FR码的性能界限,包括支撑文件大小上界和最小距离上界;提出了基于图论和组合设计的最优码字构造方法,能够达到上述理论上界;拓展了FR码的应用场景,可以支持系统参数动态调整;提出了基于遗传算法的码字构造方法,提升了FR码的支撑文件大小;部署实现了所构造的FR码,验证分析了整体存储性能。.本项目从理论和实践两个角度研究了部分重复码的多个关键科学问题,相关成果可应用于大规模数据中心和分布式文件系统等存储平台。
{{i.achievement_title}}
数据更新时间:2023-05-31
论大数据环境对情报学发展的影响
硬件木马:关键问题研究进展及新动向
小跨高比钢板- 混凝土组合连梁抗剪承载力计算方法研究
滚动直线导轨副静刚度试验装置设计
Himawari-8/AHI红外光谱资料降水信号识别与反演初步应用研究
分布式存储系统中局部修复码的研究
基于阵列码的分布式容灾存储系统
分布式存储系统中的低计算复杂度再生码研究
分布式存储系统中高码率MDS码关键问题研究