ARM 기반 마이크로콘트롤러를 위한 TensorFlow Lite 수행 엔진 최적화

이찬규

DC Field	Value	Language
dc.contributor.advisor	김영진	-
dc.contributor.author	이찬규	-
dc.date.issued	2024-08	-
dc.identifier.other	34127	-
dc.identifier.uri	https://aurora.ajou.ac.kr/handle/2018.oak/39289	-
dc.description	학위논문(석사)--전자공학과,2024. 8	-
dc.description.abstract	AI의 연구가 점점 발전해가면서 자원이 매우 한정적인 마이크로콘트롤러에서 의 AI 모델 사용을 위한 다양한 최적화 기법이 연구되고 있다. 마이크로콘트롤 러에서 AI 모델을 효율적으로 수행하기 위한 다양한 프레임워크들이 있으며 그 중 오픈소스로 대표적인 것이 TensorFlow Lite for Micorocontrollers (TFLM)이다. TFLM은 마이크로콘트롤러에서 모델을 수행하기 위한 마이크로 용 라이브러리를 제공하고 있다. 이러한 프레임워크 말고도 모델의 경량화 및 압축 등 모델을 수행하는 기기가 아닌 서버에서 오프라인 (off-line)으로 일률 적으로 최적화를 수행하는 오프라인 최적화가 있으며 모델이 실행되는 동안 적 용되는 온라인 (on-line) 최적화가 있다. 온라인 최적화로 대표적인 것은 ARM 프로세서에 최적화된 추론 함수를 제공하는 CMSIS-NN 라이브러리로 TFLM의 수행 엔진은 CMSIS-NN을 접목하여 ARM 코어에서 최적화된 커널 함수를 수행한다. 본 논문은 CMSIS-NN에서 제공하는 추론 함수 중 Convolution 함수에서 Im2col을 수행하면서 발생하는 오버헤드를 줄이기 위한 D2I 기법을 제안한다. 해당 최적화 기법을 적용하면 Im2col에서 수행하는 데 이터 복사 과정을 크게 생략하여 데이터 연산이 들어가기 전 발생하는 병목 현 상을 줄일 수 있다. 하지만 D2I은 연산의 오버헤드는 오히려 커지므로 수행되 는 layer의 데이터 복사와 연산의 오버헤드를 고려하여 CSR과 Im2col의 멀티 패스를 적용하여 convolution이 2개 포함된 MNIST 모델에서 약 10.1% 수행 시간이 빨라지는 것을 확인하였다.\|As AI research continues to advance, various optimization techniques for using AI models in microcontrollers with very limited resources are being studied. There are various frameworks for efficiently performing AI models on microcontrollers, and the representative open source one is TensorFlow Lite for Microcontrollers (TFLM). TFLM provides a micro library for executing models on microcontrollers. In addition to these frameworks, there is offline optimization that uniformly performs optimization off-line on a server rather than the device executing the model, such as lightening and compressing the model, and on-line optimization that is applied while the model is running. There is. A representative example of online optimization is the CMSIS-NN library, which provides inference functions optimized for ARM processors. TFLM's performance engine combines CMSIS-NN to perform optimized kernel functions on ARM cores. This paper proposes a D2I technique to reduce the overhead that occurs while performing Im2col in the convolution function among the inference functions provided by CMSIS-NN. By applying this optimization technique, the data copy process performed by Im2col can be largely omitted, thereby reducing bottlenecks that occur before data operations begin. However, D2I's computational overhead is rather large, so by applying multi-pass of D2I and Im2col, considering the data copy and computational overhead of the performed layer, the execution time is decreased by about 10.1% in the MNIST model with two convolutions.	-
dc.description.tableofcontents	1. 서론 1_x000D_ <br>2. 관련 연구 2_x000D_ <br> 2.1. 마이크로콘트롤러를 위한 딥러닝용 프레임워크 2_x000D_ <br> 2.2. 모델 경량화 및 가속화 연구 3_x000D_ <br> 2.3. CMSIS-NN 5_x000D_ <br> 2.4. CSR 6_x000D_ <br>3. 연구 동기 8_x000D_ <br>4. TFLM 수행 엔진 분석 및 CMSIS-NN 테스트 10_x000D_ <br> 4.1. TFLM 수행 엔진 분석 10_x000D_ <br> 4.2. Interpreter 분석 11_x000D_ <br> 4.3. CMSIS-NN 테스트 12_x000D_ <br>5. 제안하는 수행 엔진 최적화 방법 14_x000D_ <br> 5.1. Im2col overhead 및 제안하는 기법 14_x000D_ <br> 5.2. D2I tabel 15_x000D_ <br> 5.3. D2I 가중치 16_x000D_ <br> 5.4. Im2col과 D2I 연산 구조 비교 17_x000D_ <br>6. 실험결과 18_x000D_ <br> 6.1. D2I 적용 시 오퍼레이션 변화 관찰 18_x000D_ <br> 6.2. 필터 개수에 따른 D2I 이득율 및 정확도 20_x000D_ <br> 6.3. 입력 크기 변화에 따른 D2I 이득율 22_x000D_ <br> 6.4. D2I 적용 범위 설정 22_x000D_ <br>7. 결론 및 향후 연구 24_x000D_ <br>참고문헌 26_x000D_ <br>Abstract 29_x000D_	-
dc.language.iso	kor	-
dc.publisher	The Graduate School, Ajou University	-
dc.rights	아주대학교 논문은 저작권에 의해 보호받습니다.	-
dc.title	ARM 기반 마이크로콘트롤러를 위한 TensorFlow Lite 수행 엔진 최적화	-
dc.title.alternative	Optimization of TensorFlow Lite execution engine for ARM-based microcontrollers	-
dc.type	Thesis	-
dc.contributor.affiliation	아주대학교 대학원	-
dc.contributor.alternativeName	CHAN-KYU LEE	-
dc.contributor.department	일반대학원 전자공학과	-
dc.date.awarded	2024-08	-
dc.description.degree	Master	-
dc.identifier.url	https://dcoll.ajou.ac.kr/dcollection/common/orgView/000000034127	-
dc.subject.keyword	AI model	-
dc.subject.keyword	CMSIS-NN	-
dc.subject.keyword	Convolution	-
dc.subject.keyword	TensorFlow Lite	-
dc.description.alternativeAbstract	As AI research continues to advance, various optimization techniques for using AI models in microcontrollers with very limited resources are being studied. There are various frameworks for efficiently performing AI models on microcontrollers, and the representative open source one is TensorFlow Lite for Microcontrollers (TFLM). TFLM provides a micro library for executing models on microcontrollers. In addition to these frameworks, there is offline optimization that uniformly performs optimization off-line on a server rather than the device executing the model, such as lightening and compressing the model, and on-line optimization that is applied while the model is running. There is. A representative example of online optimization is the CMSIS-NN library, which provides inference functions optimized for ARM processors. TFLM's performance engine combines CMSIS-NN to perform optimized kernel functions on ARM cores. This paper proposes a D2I technique to reduce the overhead that occurs while performing Im2col in the convolution function among the inference functions provided by CMSIS-NN. By applying this optimization technique, the data copy process performed by Im2col can be largely omitted, thereby reducing bottlenecks that occur before data operations begin. However, D2I's computational overhead is rather large, so by applying multi-pass of D2I and Im2col, considering the data copy and computational overhead of the performed layer, the execution time is decreased by about 10.1% in the MNIST model with two convolutions.	-

Show simple item record

qrcode

트윗하기

Total Views & Downloads

File Download

There are no files associated with this item.