Go is a popular strategy game today, but due to its large search space and task complexity, ensuring stable AI implementation is challenging. Specifically, Go AI training requires setting a fixed optimal learning rate and schedule, which demands significant TPU and GPU resources. To facilitate Go-AI learning, this research explores adaptive adjustment and optimization techniques for dynamic reinforcement learning neural networks. First, we introduce a dynamic batch size technique that adjusts data volume at each training phase and incorporates dynamic network structure search, considering the number of network layers and residual blocks. Second, we propose a dynamic network topology that automatically modifies the learning rate based on the training batch size at various training phases. Our approach outperforms the baseline in terms of stability and model convergence speed. In 100 games, the Go-AI model achieved a 100% victory rate below the 7th rank and a 98% win rate at the 9th rank and higher.
This work was supported partially by the Brain Korea 21 (BK21) FOUR program of the National Research Foundation of Korea funded by the Ministry of Education (NRF5199991514504). (Corresponding author: Byeong-hee Roh, E-mail: bhroh@ajou.ac.kr) C. Zhang, J. Lim and B. Roh are with the Department of AI Convergence Network, Ajou University, Suwon, 16499, Korea.(E-mail: {cjz, wjdguszoqt, bhroh}@ajou.ac.kr) G. Shan is with the Department of Software and Computer Engineering, Ajou University, Suwon, 16499, Korea.(E-mail: shanyang166@ajou.ac.kr) Manuscript received xxx; revised xxx.