An effective design to improve the efficiency of DPUs on FPGA

Journal: Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS

Citation: Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS, Vol.2020-December, pp.206-213

Keyword: Convolutional neural network (CNN)Deep learning processor unit (DPU)Efficiency Field programmable gate array (FPGA)

Mesh Keyword: Application systems CNN models Color space conversion Different sizes Entire system Processing units Semantic segmentation Two-dimension

Abstract: Convolutional neural networks (CNNs) have been widely used in various complicated problems, such as image classification, objection detection, semantic segmentation. To meet diversified CNN structures, the deep learning processing unit (DPU) is designed as a general accelerator on field programmable gate array (FPGA) to support various CNN layers, such as convolution, pooling, activation, etc. However, low DPU utilization and schedule efficiency appear when DPU used to multitask application completed by CNN models. In this paper, an effective design including multi-core with different size (MCDS) and DPU Plus is proposed to improve the efficiency of DPUs usage from the two dimensions of time and space. Through increasing the number of DPU cores on an FPGA and the utilization of single DPU core, the design of MCDS can effectively improve the overall throughput with restricted on-chip resources. Furthermore, the design of DPU Plus is proposed to improve the schedule efficiency of DPUs through simultaneously implementing DPU with other significant auxiliary modules of the application system on the same FPGA. Finally, a color space conversion module is implemented cooperate to the DPU cores to testify its performance, and the experimen shows that compared with running on the the CPU completely, it achieves16.2x acceleration, and increases the throughput of the entire system by 3.0x.

URI: https://aurora.ajou.ac.kr/handle/2018.oak/36580
https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85102387372&origin=inward

Funding: This work is supported in part by the National Key Research and Development Program of China under Grant 2018YFB1003702, Natural Science Foundation of China under Grant No. 62032020, 62076214, Hunan Science and Technology Planning Project under Grant No.2019RS3019, and the Hunan Provincial Natural Science Foundation of China for Distinguished Young Scholars under Grant 2018JJ1025.

qrcode