In this work we describe libMPNode, an OpenMP runtime designed for efficient multithreaded execution across systems composed of multiple non-cache-coherent domains. Rather than requiring extensive compiler-level transformations or building new programming model abstractions, libMPNode builds on recent works that allow developers to use a traditional shared-memory programming model to build applications that are migratable between incoherent domains. libMPNode handles migrating threads between domains, or nodes, and optimizes many OpenMP mechanisms to reduce cross-node communication. While applications may not scale as written, we describe early experiences in simple code refactoring techniques that help scale performance by only changing a handful of lines of code. We describe and evaluate the current implementation, report on experiences using the runtime, and describe future research directions for multi-domain OpenMP.
This work is supported in part by grants received by Virginia Tech including that from ONR under grant N00014-16-1-2711 and NAVSEA/NEEC under grant N00174-16-C-0018. Dr. Kim\u2019s work at Virginia Tech (former affiliation) was supported in part by ONR under grant N00014-16-1-2711, and his work at Ajou University was supported in part by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2018R1C1B5085902).