Hierarchical Reinforcement Learning using Gaussian Random Trajectory Generation in Autonomous Furniture Assembly

Journal: International Conference on Information and Knowledge Management, Proceedings

Citation: International Conference on Information and Knowledge Management, Proceedings, pp.3624-3633

Keyword: assembly control hierarchical reinforcement learning reinforcement learning robotics

Mesh Keyword: Assembly controls Assembly problems Gaussians Hierarchical reinforcement learning High level policies Human like Manipulation task Reinforcement learning method Reinforcement learnings Trajectory generation

All Science Classification Codes (ASJC): Business, Management and Accounting (all)Decision Sciences (all)

Abstract: In this paper, we propose a Gaussian Random Trajectory guided Hierarchical Reinforcement Learning (GRT-HL) method for autonomous furniture assembly. The furniture assembly problem is formulated as a comprehensive human-like long-horizon manipulation task that requires a long-term planning and a sophisticated control. Our proposed model, GRT-HL, draws inspirations from the semi-supervised adversarial autoencoders, and learns latent representations of the position trajectories of the end-effector. The high-level policy generates an optimal trajectory for furniture assembly, considering the structural limitations of the robotic agents. Given the trajectory drawn from the high-level policy, the low-level policy makes a plan and controls the end-effector. We first evaluate the performance of GRT-HL compared to the state-of-the-art reinforcement learning methods in furniture assembly tasks. We demonstrate that GRT-HL successfully solves the long-horizon problem with extremely sparse rewards by generating the trajectory for planning.

URI: https://aurora.ajou.ac.kr/handle/2018.oak/36845
https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85140836688&origin=inward

Funding: This work was supported by Samsung Electronics (IO201208-07855-01) and also by MSIT, Korea, under ITRC (IITP-2022-2017-0-01637) supervised by IITP. The authors thank to Mr. MyungJae Shin for his contribution on research initiation, during his master study under the guidance of Prof. Joongheon Kim. Soyi Jung, Jong-Kook Kim, and Joongheon Kim are the corresponding authors.

qrcode