OS-capable embedded systems exhibiting a very low power consumption are available at an extremely low price point. It makes them highly compelling in a datacenter context. We show that sharing long-running, compute-intensive datacenter workloads between a server machine and one or a few connected embedded boards of negligible cost and power consumption can yield significant performance and energy benefits. Our approach, named Heterogeneous EXecution Offloading (HEXO), selectively offloads Virtual Machines (VMs) from server-class machines to embedded boards. Our design tackles several challenges. We address the Instruction Set Architecture (ISA) difference between typical servers (x86) and embedded systems (ARM) through hypervisor and guest OS-level support for heterogeneous-ISA runtime VM migration. We cope with the low amount of resources in embedded systems by using lightweight VMs - unikernels - and by using the server's free RAM as remote memory for embedded boards through a transparent lightweight memory disaggregation mechanism for heterogeneous server-embedded clusters, called Netswap. VMs are offloaded based on an estimation of the slowdown expected from running on a given board. We build a prototype of HEXO and demonstrate significant increases in throughput (up to 67%) and energy efficiency (up to 56%) using benchmarks representative of compute-intensive long-running workloads.
This work was supported in part by the US Office of Naval Research (ONR) under Grant N00014-16-1-2104, Grant N00014-16-1-2711, Grant N00014-19-1-2493, and Grant N00014-22-1-2672; in part by the US National Science Foundation (NSF) under Grant CNS 2127491; in part by the German Federal Ministry of Education and Research (BMBF) under Grant 16ME0688 (Project ScalNEXT); in part by the Institute of Information and communications Technology Planning and Evaluation (IITP) under the Artificial Intelligence Convergence Innovation Human Resources Development under Grant IITP-2024-RS-2023-00255968 grant funded in part by the Korea Government (MSIT); and in part by the UK EPSRC under Grant EP/V012134/1, Grant EP/V000225/1, and Grant EP/X015610/1.This work is supported in part by the US Office of Naval Research (ONR) under grants N00014-16-1-2104, N00014-16-1-2711, N00014-19-1-2493, and N00014-22-1-2672; the US National Science Foundation (NSF) under grant CNS 2127491; the German Federal Ministry of Education and Research (BMBF) under Grant 16ME0688 (Project Scal-NEXT); the Institute of Information & communications Technology Planning & Evaluation (IITP) under the Artificial Intelligence Convergence Innovation Human Resources Development (IITP-2024-RS-2023-00255968) grant funded by the Korea government (MSIT); and the UK EPSRC grants EP/V012134/1, EP/V000225/1, and EP/X015610/1.