Owing to the development of next-generation network and data processing technologies, massive Internet of Things (IoT) devices are becoming hyperconnected. As a result, Linux malware is being created to attack such hyperconnected networks by exploiting security threats in IoT devices. To determine the potential threats of such Linux malware and respond effectively, malware classification through an analysis of the executed code is required; however, a limitation exists in that each heterogeneous architecture must be analyzed separately. However, the binary codes of a heterogeneous architecture can be translated to a high-level intermediate representation (IR) of the same format using binary lifting and malicious behavior information can be identified because the functions and parameters of the assembly code are stored in the IR. Consequently, this study suggests a Linux malware classification method applicable to various architectures by converting Linux assembly codes into an IR using binary lifting and then learning the IR Sequence which reflects malicious behavior pattern using deep learning model for sequence learning.
Funding Statement: This work was supported by a National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (No. 2021R1A2C2011391) and this work was supported by the BK21 FOUR program of the National Research Foundation of Korea funded by the Ministry of Education (NRF5199991514504).