Ajou University repository

When HPC Scheduling Meets Active Learning: Maximizing The Performance with Minimal Data
  • Oh, Sangyoon ;
  • Choi, Jiheon ;
  • Lee, Jaehyun ;
  • Choo, Minsol ;
  • Yoon, Taeyoung ;
  • Kwon, Oh Kyoung
Citations

SCOPUS

0

Citation Export

Publication Year
2025-03-27
Journal
Proceedings of International Conference on High Performance Computing in Asia-Pacific Region, HPC Asia 2025
Publisher
Association for Computing Machinery, Inc
Citation
Proceedings of International Conference on High Performance Computing in Asia-Pacific Region, HPC Asia 2025, pp.99-109
Keyword
Active LearningHPC Application Characteristic PredictionUncertainty Sampling
Mesh Keyword
Active LearningHigh performance computing systemsHigh-performance computing application characteristic predictionHigh-performance computing applicationsLabeled dataPerformancePerformance computingRandom forestsResource managementUncertainty samplings
All Science Classification Codes (ASJC)
Hardware and ArchitectureSoftwareTheoretical Computer Science
Abstract
High-performance computing (HPC) systems face complex resource management challenges as they accommodate diverse workloads, from traditional scientific simulations to emerging AI workflows. Accurate workload characterization is essential for optimal resource allocation, yet existing approaches often require extensive labeled data or rely on rigid heuristics that struggle to adapt to new workload patterns. This paper presents RF-AUS (Random Forest Adaptive Uncertainty Sampling), a novel active learning framework designed for HPC workload classification. Our proposed RF-AUS leverages ensemble-based uncertainty measurement, out-of-bag estimates, and feature importance weighting to achieve high classification performance with minimal labeled data. In addition, it employs a diversity-aware sampling strategy to balance exploration and exploitation, ensuring comprehensive coverage of the feature space while prioritizing the most informative instances. Using real-world workload data from NURION, a leading HPC system in South Korea, we demonstrate that RF-AUS achieves a comparable classification accuracy with 47.53% fewer labeled samples than conventional methods. RF-AUS shows robust performance across diverse applications, making it particularly well suited for production HPC environments where efficient resource utilization is crucial.
Language
eng
URI
https://aurora.ajou.ac.kr/handle/2018.oak/38559
https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=105002271701&origin=inward
DOI
https://doi.org/10.1145/3712031.3712334
Journal URL
http://dl.acm.org/citation.cfm?id=3712031
Type
Conference Paper
Funding
The authors wish to acknowledge the use of Google Translate, Grammarly, Writefull, and ChatGPT in the writing of this paper. Google Translate was used to assist with translation of certain sentences from its original language, and Grammarly, Writefull, and ChatGPT were used to enhance the grammar, clarity, and style of the manuscript. The paper remains an accurate representation of the authors' underlying work and novel intellectual contributions. This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIT) (RS- 2023-00283799). The corresponding author is Sangyoon Oh.
Show full item record

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Oh, Sangyoon Image
Oh, Sangyoon오상윤
Department of Software and Computer Engineering
Read More

Total Views & Downloads

File Download

  • There are no files associated with this item.