The processing speed of the traditional software-based networking stack in end-host systems has become the performance bottleneck of current data center network. As such, the research community has paid much attention on the high-performance hardware-based networking stack. However, current hardware-based networking stack only can transmit on single-path, which falls short for handling path failure and network load-balance. To address it, this work advocates the multi-path transport for hardware-based network stack, to design a hardware-based multi-path transport protocol. The key challenge of designing such protocol is that the available memory resource on hardware is very limited. However, the three key parts of hardware-based multi-path transport, which are multi-path congestion control, in-order transmission, and loss recovery, all requires huge memory footprint. As such, we propose several techniques including ACK-clocking-based multi-path congestion control, bitmap compression, out-of-order-awared path selection, out-of-order-degree-based fast loss detection, and opportunistic tail retransmission, thus to co-optimize the memory footprint of the three key parts in multipath transport protocol, and implement it in hardware-based network stack. Our hardware-based multi-path transport protocol will significantly improve the capability of existing hardware-based transport on handling network failure, as well as the network load-balance efficiency, which is very important on improving the performance of data center network and the large-scale distributed services on the data center.
端系统中传统软件协议栈的处理能力已成为当前数据中心网络性能的主要瓶颈,因此高性能硬件协议栈近年来成为领域内研究重点。但目前硬件协议栈仅支持单路径传输,不利于应对路径故障及均衡网络负载。为此,本项目创新地提出硬件协议栈的多路径传输问题,研究面向硬件协议栈的多路径传输层协议。其关键难点在于硬件可用存储资源极少,而多路径传输层协议中的三个主要机制——多路径拥塞控制、保序传输、丢包恢复皆须消耗大量存储。因此,本项目通过采用基于ACK时钟的多路径拥塞控制、位图压缩、基于乱序感知的智能选路、基于乱序程度的丢包判断、尾部机会重传这几个技术手段,联合优化多路径拥塞控制、保序传输、丢包恢复三个关键机制的存储开销,突破在硬件低存储下实现多路径传输层协议的难题。项目研究内容将显著改善现有硬件协议栈对网络故障的容错能力、提升网络负载均衡效率,对提升数据中心网络的传输性能及其所服务的大型分布式业务的性能有重要意义。
数据中心网络对性能的需求飞速增长,传统软件协议栈的处理能力已无法满足。而当前硬件协议栈仅支持单路径传输,无法应对网络路径故障以及均衡网络负载,因此如何利用网络内丰富的多路径进行高效传输成为关键难题。本项目通过建立流模型,分析了在有限硬件资源下进行多路径传输需要满足的条件,之后提出了一种基于硬件的多路径传输层协议,完成了算法实现并开发了原型机,经过系统评估和仿真大规模实验评估,能够有效利用数据中心内部路径多样性,片上连接的存储开销仅需66Bytes。在项目研究期间,共发表了9篇会议和期刊论文,其中CCF-A类2篇,CCF-B类1篇。项目研究成果已经运用到华为的鲲鹏芯片当中,通过该项目技术成果,极大降低了芯片中网络传输模块的资源开销,有效提升了产品大规模部署能力。
{{i.achievement_title}}
数据更新时间:2023-05-31
特斯拉涡轮机运行性能研究综述
硬件木马:关键问题研究进展及新动向
近水平层状坝基岩体渗透结构及其工程意义
计及焊层疲劳影响的风电变流器IGBT 模块热分析及改进热网络模型
强震作用下铁路隧道横通道交叉结构抗震措施研究
面向软件定义数据中心的多路径传输机制及联合优化理论研究
数据中心网络中延时敏感的传输控制协议
数据中心网络中面向虚拟化环境的传输控制机制研究
面向新应用的数据中心网络DCN的自适应多粒度传输研究