Software Defined Networking (SDN) uses an architecture that is vertically separated into a control plane, a data plane, and an application plane. Though research has been conducted to apply reinforcement learning methods for path selections in SDN environments, they have still problems with limited and unstable features in variable network conditions. In this paper, we propose a Soft Actor Critic (SAC)-based learning methods that can be applied to dynamic, to solve the problems in DDPG-based methods with the problem that do not converge quickly in continuously changed networking environments.