加拿大pc蛋蛋28平台

A 56-Gb/s Reconfigurable Silicon-Photonics Transmitter Using High-Swing D...
Mechanism of defects and electrode structure on the performance of AlN-ba...
Broad gain, continuous-wave operation of InP-based quantum cascade laser ...
A tunable Raman system based on ultrafast laser for Raman excitation prof...
A double quantum dot defined by top gates in a single crystalline InSb na...
Achieving Wide Operating Voltage Windows in Non-Carrier Injection Micro-L...
Reducing sputter induced stress and damage for efficient perovskite/silic...
Modulation of MagR magnetic properties via iron-sulfur cluster binding
Suppression of Surface Defects and Vibrational Coupling in GaN by a Graph...
Optical Control of Bulk Phonon Modes in Crystalline Solids
官方微信
友情链接

A 12.1 TOPS/W Quantized Network Acceleration Processor With Effective-Weight-Based Convolution and Error-Compensation-Based Prediction

2021-12-30

 

Author(s): Mo, HY (Mo, Huiyu); Zhu, WP (Zhu, Wenping); Hu, WJ (Hu, Wenjing); Li, Q (Li, Qiang); Li, A (Li, Ang); Yin, SY (Yin, Shouyi); Wei, SJ (Wei, Shaojun); Liu, LB (Liu, Leibo)

Source: IEEE JOURNAL OF SOLID-STATE CIRCUITS DOI: 10.1109/JSSC.2021.3113569 Early Access Date: OCT 2021

Abstract: In this article, a quantized network acceleration processor (QNAP) is proposed to efficiently accelerate CNN processing by eliminating most unessential operations based on algorithm-hardware co-optimizations. First, an effective-weight-based convolution (EWC) is proposed to distinguish a group of effective weights (EWs) to replace the other unique weights. Therefore, the input activations corresponding to the same EW can be accumulated first and then multiplied by the EW to reduce amounts of multiplication operations, which is efficiently supported by the dedicated process elements in QNAP. The experimental results show that energy efficiency is improved by 1.59x-3.20x compared with different UCNN implementations. Second, an error-compensation-based prediction (ECP) method adopts trained compensated values to replace partly unimportant partial sums to further reduce potentially redundant addition operations caused by the ReLU function. Compared with SnaPEA and Pred on AlexNet, 1.23x and 1.75x higher energy efficiencies (TOPS/W) are achieved by ECP, respectively, with marginal accuracy loss. Third, the residual pipeline mode is proposed to efficiently implement residual blocks with a 1.5x lower memory footprint, a 1.18x lower power consumption, and a 13.15% higher hardware utilization on average than existing works. Finally, the QNAP processor is fabricated in the TSMC 28-nm CMOS process with a core area of 1.9 mm(2). Benchmarked with AlexNet, VGGNet, GoogLeNet, and ResNet on ImageNet at 470 MHz and 0.9 V, the processor achieves 117.4 frames per second with 131.6-mW power consumption on average, which outperforms the state-of-the-art processors by 1.77x-24.20x in energy efficiency.

Accession Number: WOS:000732394400001

ISSN: 0018-9200

eISSN: 1558-173X

Full Text:



关于我们
下载视频观看
联系方式
通信地址

BEIJINGSHIHAIDIANQUQINGHUADONGLUJIA35HAO(LINDABEILUZHONGDUAN) BEIJING912XINXIANG (100083)

电话

010-82304210/010-82305052(CHUANZHEN)

E-mail

semi@semi.ac.cn

交通地图
友情链接
版权所有 中国科学院半导体研究所

备案号:, 京公网安备110402500052 中国科学院半导体所声明