The application of embedded deep learning in terminal devices such as wearables, internet of things (IoTs) and mobile phones will become a trend while low power is the main requirement. Due to the strong versatility and the physical separation of computing and storage, traditional CPU or GPU consumes excessive power and cannot meet the energy efficiency requirement of embedded deep learning. At present, the SOC (system-on-chip) with the specialized architectures like accelerator and the introduction of on-chip storage can improve the energy efficiency, but the need for a large number of multiply-add cells (MACs) and the moving of the weights still brings remarkable power waste. This project intends to address the above-mentioned problem by exploiting the advantage of memristor in low power realization of weighted summation in deep learning. The issues of previous work include the power of circuit scheme still being large, the lack of critical layer circuit and the response speed getting slow with increased layers. The highlights of the corresponding solutions include the adoption of current sampling to reduce the power consumption of memristor crossbar array for weighted summation, the optimization of convolution layer and classification layer according to the computing characteristics of deep learning, the circuit implementation of pooling layer in time domain, and the design of pipeline mechanism to enhance the response speed of system. This project is to establish a simulation evaluation environment and verify the feasibility and advantages of the proposed scheme by the test of handwritten digit recognition application based on the 28nm process parameters.
嵌入式深度学习在可穿戴设备、物联网和手机等特别关注功耗的终端场合的应用将成为趋势。传统的CPU或GPU由于较强的通用性以及计算和存储在物理上分开导致功耗过大,无法满足嵌入式深度学习能效的要求。目前主流的专用架构SOC采用加速器等专用模块并引入片上存储虽然提高了能效,但是需要大量的乘加单元和对权值的搬运,仍然存在较大的功耗浪费。对此,本项目拟结合忆阻器在实现加权求和运算方面具有低功耗的优势,针对现有研究存在的电路功耗仍然较大、缺少关键层次的实现电路以及响应速度随层次增加而变慢等问题,研究相应的解决措施,要点包括:采用电流采样降低基于忆阻器交叉阵列的加权求和电路的功耗;根据深度学习运算特点对卷积层和分类层进行优化;采用基于时域的方法实现池化层电路;采用流水线机制提升系统的响应速度。本项目拟建立仿真评估环境,基于28nm工艺参数,以手写数字识别应用为例验证所提方案的可行性和优势。
深度学习在语音图像识别、无人驾驶、医疗诊断等领域有着非常重要的应用。传统的硬件芯片(CPU&GPU)由于存储墙并不能满足嵌入式深度学习对低功耗的要求。专用架构SOC采用成熟的数字集成电路技术,根据深度学习运算的特点,设计ASIC(专用集成电路)、Accelerator(加速器)或者Custom Processor(定制处理器)等深度学习专用计算模块,并引入片上的分布式存储器(一般是SRAM),来提升运算的并行度同时减少对片外存储器的访问次数,从而提高深度学习的能效。但是,专用架构SOC的加权求和运算需要大量的乘加单元(MAC),功耗比较显著,而且权值仍然存储在片外的DRAM或者片上的SRAM上,运算过程中对权值的搬运也会带来较大功耗。另外,SRAM作为片上存储器,存储密度比较低,仅能存储部分数据或满足运行比较小的神经网络的存储需求。本项目针对嵌入式深度学习低功耗的诉求,结合memristor在实现加权求和电路方面低功耗的优势,完成了深度学习CNNs基于memristor的低功耗实现电路,包括卷积层、池化层以及分类层,以及适合各个层次的流水线机制,来提升深度学习的能效和响应速度。为了验证所提出的方案,本项目建立仿真评估环境,基于28nm工艺参数验证所提方案的可行性和优势,并且以手写数字识别应用为例进行了验证,实现的关键指标包括:1) 能效比数字方式的专用架构SOC提升60%以上,比预期高10%以上;2) 识别的准确率达到95.02%,达到预期的95%。在项目执行过程中,申请7项专利,有1项已获授权;项目负责人以第一作者或通讯作者发表10篇论文,其中包括3篇SCI包括1篇Journal of Solid-State Circuits (JSSC, top-1)、1篇Micromachines和1篇Nanoscale Research Letters,6篇EI包括两篇Asian Solid-State Circuits Conference (A-SSCC)、1篇European Conference on Solid-State Device Research (ESSDERC)。
{{i.achievement_title}}
数据更新时间:2023-05-31
路基土水分传感器室内标定方法与影响因素分析
拥堵路网交通流均衡分配模型
基于公众情感倾向的主题公园评价研究——以哈尔滨市伏尔加庄园为例
基于ESO的DGVSCMG双框架伺服系统不匹配 扰动抑制
基于协同表示的图嵌入鉴别分析在人脸识别中的应用
忆阻器的伴随忆容效应及其对忆阻电路动力学的深度影响
阻变存储器中忆阻、忆容、忆感共存分量的诱导控制技术研究
多值忆阻器及三值忆阻数字逻辑运算电路设计
深度学习在MRPC性能分析中应用研究