ARM 기반 마이크로콘트롤러를 위한 TensorFlow Lite 수행 엔진 최적화

이찬규

Advisor: 김영진

Affiliation: 아주대학교 대학원

Department: 일반대학원 전자공학과

Publication Year: 2024-08

Publisher: The Graduate School, Ajou University

Keyword: AI model CMSIS-NN Convolution TensorFlow Lite

Description: 학위논문(석사)--전자공학과,2024. 8

Abstract: AI의 연구가 점점 발전해가면서 자원이 매우 한정적인 마이크로콘트롤러에서 의 AI 모델 사용을 위한 다양한 최적화 기법이 연구되고 있다. 마이크로콘트롤 러에서 AI 모델을 효율적으로 수행하기 위한 다양한 프레임워크들이 있으며 그 중 오픈소스로 대표적인 것이 TensorFlow Lite for Micorocontrollers (TFLM)이다. TFLM은 마이크로콘트롤러에서 모델을 수행하기 위한 마이크로 용 라이브러리를 제공하고 있다. 이러한 프레임워크 말고도 모델의 경량화 및 압축 등 모델을 수행하는 기기가 아닌 서버에서 오프라인 (off-line)으로 일률 적으로 최적화를 수행하는 오프라인 최적화가 있으며 모델이 실행되는 동안 적 용되는 온라인 (on-line) 최적화가 있다. 온라인 최적화로 대표적인 것은 ARM 프로세서에 최적화된 추론 함수를 제공하는 CMSIS-NN 라이브러리로 TFLM의 수행 엔진은 CMSIS-NN을 접목하여 ARM 코어에서 최적화된 커널 함수를 수행한다. 본 논문은 CMSIS-NN에서 제공하는 추론 함수 중 Convolution 함수에서 Im2col을 수행하면서 발생하는 오버헤드를 줄이기 위한 D2I 기법을 제안한다. 해당 최적화 기법을 적용하면 Im2col에서 수행하는 데 이터 복사 과정을 크게 생략하여 데이터 연산이 들어가기 전 발생하는 병목 현 상을 줄일 수 있다. 하지만 D2I은 연산의 오버헤드는 오히려 커지므로 수행되 는 layer의 데이터 복사와 연산의 오버헤드를 고려하여 CSR과 Im2col의 멀티 패스를 적용하여 convolution이 2개 포함된 MNIST 모델에서 약 10.1% 수행 시간이 빨라지는 것을 확인하였다.|As AI research continues to advance, various optimization techniques for using AI models in microcontrollers with very limited resources are being studied. There are various frameworks for efficiently performing AI models on microcontrollers, and the representative open source one is TensorFlow Lite for Microcontrollers (TFLM). TFLM provides a micro library for executing models on microcontrollers. In addition to these frameworks, there is offline optimization that uniformly performs optimization off-line on a server rather than the device executing the model, such as lightening and compressing the model, and on-line optimization that is applied while the model is running. There is. A representative example of online optimization is the CMSIS-NN library, which provides inference functions optimized for ARM processors. TFLM's performance engine combines CMSIS-NN to perform optimized kernel functions on ARM cores. This paper proposes a D2I technique to reduce the overhead that occurs while performing Im2col in the convolution function among the inference functions provided by CMSIS-NN. By applying this optimization technique, the data copy process performed by Im2col can be largely omitted, thereby reducing bottlenecks that occur before data operations begin. However, D2I's computational overhead is rather large, so by applying multi-pass of D2I and Im2col, considering the data copy and computational overhead of the performed layer, the execution time is decreased by about 10.1% in the MNIST model with two convolutions.

Alternative Abstract: As AI research continues to advance, various optimization techniques for using AI models in microcontrollers with very limited resources are being studied. There are various frameworks for efficiently performing AI models on microcontrollers, and the representative open source one is TensorFlow Lite for Microcontrollers (TFLM). TFLM provides a micro library for executing models on microcontrollers. In addition to these frameworks, there is offline optimization that uniformly performs optimization off-line on a server rather than the device executing the model, such as lightening and compressing the model, and on-line optimization that is applied while the model is running. There is. A representative example of online optimization is the CMSIS-NN library, which provides inference functions optimized for ARM processors. TFLM's performance engine combines CMSIS-NN to perform optimized kernel functions on ARM cores. This paper proposes a D2I technique to reduce the overhead that occurs while performing Im2col in the convolution function among the inference functions provided by CMSIS-NN. By applying this optimization technique, the data copy process performed by Im2col can be largely omitted, thereby reducing bottlenecks that occur before data operations begin. However, D2I's computational overhead is rather large, so by applying multi-pass of D2I and Im2col, considering the data copy and computational overhead of the performed layer, the execution time is decreased by about 10.1% in the MNIST model with two convolutions.

Language: kor

URI: https://aurora.ajou.ac.kr/handle/2018.oak/39289

Journal URL: https://dcoll.ajou.ac.kr/dcollection/common/orgView/000000034127

Show full item record

qrcode

트윗하기

Total Views & Downloads

File Download

There are no files associated with this item.