Ajou University repository

An effective design to improve the efficiency of DPUs on FPGA
  • Lei, Yutian ;
  • Deng, Qingyong ;
  • Long, Saiqin ;
  • Liu, Shaohui ;
  • Oh, Sangyoon
Citations

SCOPUS

0

Citation Export

Publication Year
2020-12-01
Journal
Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS
Publisher
IEEE Computer Society
Citation
Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS, Vol.2020-December, pp.206-213
Keyword
Convolutional neural network (CNN)Deep learning processor unit (DPU)EfficiencyField programmable gate array (FPGA)
Mesh Keyword
Application systemsCNN modelsColor space conversionDifferent sizesEntire systemProcessing unitsSemantic segmentationTwo-dimension
All Science Classification Codes (ASJC)
Hardware and Architecture
Abstract
Convolutional neural networks (CNNs) have been widely used in various complicated problems, such as image classification, objection detection, semantic segmentation. To meet diversified CNN structures, the deep learning processing unit (DPU) is designed as a general accelerator on field programmable gate array (FPGA) to support various CNN layers, such as convolution, pooling, activation, etc. However, low DPU utilization and schedule efficiency appear when DPU used to multitask application completed by CNN models. In this paper, an effective design including multi-core with different size (MCDS) and DPU Plus is proposed to improve the efficiency of DPUs usage from the two dimensions of time and space. Through increasing the number of DPU cores on an FPGA and the utilization of single DPU core, the design of MCDS can effectively improve the overall throughput with restricted on-chip resources. Furthermore, the design of DPU Plus is proposed to improve the schedule efficiency of DPUs through simultaneously implementing DPU with other significant auxiliary modules of the application system on the same FPGA. Finally, a color space conversion module is implemented cooperate to the DPU cores to testify its performance, and the experimen shows that compared with running on the the CPU completely, it achieves16.2x acceleration, and increases the throughput of the entire system by 3.0x.
ISSN
1521-9097
Language
eng
URI
https://aurora.ajou.ac.kr/handle/2018.oak/36580
https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85102387372&origin=inward
DOI
https://doi.org/10.1109/icpads51040.2020.00036
Type
Conference
Funding
This work is supported in part by the National Key Research and Development Program of China under Grant 2018YFB1003702, Natural Science Foundation of China under Grant No. 62032020, 62076214, Hunan Science and Technology Planning Project under Grant No.2019RS3019, and the Hunan Provincial Natural Science Foundation of China for Distinguished Young Scholars under Grant 2018JJ1025.
Show full item record

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Oh, Sangyoon Image
Oh, Sangyoon오상윤
Department of Software and Computer Engineering
Read More

Total Views & Downloads

File Download

  • There are no files associated with this item.