Deep learning is a kind of machine learning method that represents the most advanced level at present, and is very demanding for computing performance. The general purpose processor is restricted by the development of semiconductor technology, and it is gradually unable to meet the needs of deep learning application. The heterogeneous computing system composed of custom hardware accelerators is an important development direction to meet the energy efficiency demand in the future. Systolic array is designed for parallel computing, which is very applicable to the computation involved in the deep learning. However, at present, systolic array based accelerators for deep learning still have the problems of incomplete support for deep learning algorithms and low execution efficiency. This project aims to break through the design limitations of the existing array accelerator for deep learning, and propose the mapping method to run complete deep learning algorithm efficiently on the array accelerator by researching and exploring the hardware structure and load mapping algorithm of the systolic array accelerator. This project will fill the gap of current hardware accelerator in the field of deep learning training, and promote AI technology to provide more efficient services to the related industries.
深度学习是一种代表着当前最先进水平的机器学习方法,对计算性能要求极高。通用体系结构受到工艺水平发展的制约,逐渐无法满足深度学习应用的需求。使用定制硬件加速器所构成的异构计算系统是满足未来计算能效需求的重要发展方向。脉动阵列是一种为并行计算而设计的体系结构,非常适用于处理深度学习算法所涉及的计算。然而目前,面向深度学习的脉动阵列加速器依然存在对深度学习算法支持不完整运行效率较低等问题。本项目拟通过对脉动阵列加速器硬件结构与负载映射算法的研究与探索,突破现有阵列加速器在面向深度学习应用时的设计局限,使完整的深度学习算法可以在阵列加速器上高效运行。本项目研究将会填补当前硬件加速器在深度学习训练领域的空白,促使人工智能技术向社会相关行业提供更为高效的服务。
深度学习是一种代表着当前最先进水平的机器学习方法,对计算性能要求极高。通用体系结构受到工艺水平发展的制约,逐渐无法满足深度学习应用的需求。使用定制硬件加速器所构成的异构计算系统是满足未来计算能效需求的重要发展方向。阵列加速器是一种为并行计算而设计的体系结构,非常适用于处理深度学习算法所涉及的计算。本课题从体系结构设计、算法映射、应用优化、原型平台构建多个角度开展研究工作:设计了一套模板化的深度卷积神经网络加速器体系结构,可以支持各类2D和3D卷积神经网络结构;完成了不同网络结构的运算负载向加速器阵列的高效映射;对片上存储器设计进行了优化,使其面积更小、功耗更低;针对肺癌检测、深度图卷积神经网络两种典型应用进行了加速优化;本本课题还构建了一套由多块FPGA计算节点组成的高性能原型平台,并对多个FPGA计算节点间的通信做了特殊优化。本课在人工智能计算领域做出了一些贡献,可帮助向相关行业提供更为高效的智能计算服务。
{{i.achievement_title}}
数据更新时间:2023-05-31
基于一维TiO2纳米管阵列薄膜的β伏特效应研究
基于SSVEP 直接脑控机器人方向和速度研究
基于公众情感倾向的主题公园评价研究——以哈尔滨市伏尔加庄园为例
基于协同表示的图嵌入鉴别分析在人脸识别中的应用
一种改进的多目标正余弦优化算法
深度学习算法可重构加速器关键技术研究
面向深度学习的高能效FPGA计算架构及映射方法研究
面向病理图像处理的深度学习算法研究
面向城市遥感图像分割的深度学习算法研究