Accelerated deep reinforcement learning with efficient demonstration utilization techniques

Keyword: Deep reinforcement learning Dynamic frame skipping Experience replay Imitation learning

Mesh Keyword: Buffer management Conventional approach Dynamic frame Experience replay Imitation learning Real-world learning Reinforcement learning agent Training acceleration

All Science Classification Codes (ASJC): Software Hardware and Architecture Computer Networks and Communications

Abstract: The use of demonstrations for deep reinforcement learning (RL) agents usually accelerates their training process as well as guides the agents to learn complicated policies. Most of the current deep RL approaches with demonstrations assume that there is a sufficient amount of high-quality demonstrations. However, for most real-world learning cases, the available demonstrations are often limited in terms of amount and quality. In this paper, we present an accelerated deep RL approach with dual replay buffer management and dynamic frame skipping on demonstrations. The dual replay buffer manager manages a human replay buffer and an actor replay buffer with independent sampling policies. We also propose dynamic frame skipping on demonstrations called DFS-ER (Dynamic Frame Skipping-Experience Replay) that learns the action repetition factor of the demonstrations. By implementing DFS-ER, we can accelerate deep RL by improving the efficiency of demonstration utilization, thereby yielding a faster exploration of the environment. We verified the training acceleration in three dense reward environments and one sparse reward environment compared to the conventional approach. In our evaluation using the Atari game environments, the proposed approach showed 21.7%-39.1% reduction in training iterations in a sparse reward environment.

URI: https://aurora.ajou.ac.kr/handle/2018.oak/31153
https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85079434531&origin=inward

Funding: This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2018R1D1A1B07043858, 2018R1D1A1B07049923), the supercomputing department at KISTI (Korea Institute of Science and Technology Information) (K-19-L02-C07-S01), and Technology Innovation Program (P0006720) funded by MOTIE, Korea

qrcode