We consider a scenario where the transportation management center (TMC) guides future autonomous vehicles (AVs) toward optimal routes, aiming to bring the network in line with the system optimal (SO) principle. However, achieving this requires a joint decision-making process, while users may be non-compliant with the TMC's route guidance for personal gain. This paper models a future transportation network with a microscopic simulation, to introduce a novel concept of mixed equilibrium. In this framework, AVs follow the TMC's SO route guidance, while users can dynamically choose to either comply or manually override this autonomy based on their own judgment. We initially model a fully compliant scenario, where the centralized Q-network, analogous to a TMC, is trained using reinforcement learning (RL) to minimize total system travel time (TSTT), providing optimal routes to users. Subsequently, we extend the problem setting to a multi-agent reinforcement learning (MARL) scenario, where users can comply or deviate from the TMC's guidance based on their own decision-making. Through neural fictitious self-play (NFSP), we employ a modulating hyperparameter to investigate the impact of varying degrees of non-compliance on the overall system. Results indicate that our RL approach holds significant potential for addressing the dynamic system optimal assignment problem. Remarkably, the TMC's route guidance retains the essence of SO while integrating some level of non-compliance. However, we also demonstrate that dominant user-centric decision-making may lead to system inefficiencies while creating disparities among users. Our framework serves as an innovative tool in an AV-dominant future, offering a realistic perspective on network performance that aids in formulating effective traffic management strategies.
This research was supported by Korea Ministry of Land, Infrastructure and Transport (MOLIT) as [Innovative Talent Education Program for Smart City], and in part by the National Research Foundation of Korea (NRF) grant funded by the Korea Government ( MSIT ) (No. 2022R1A2C2012835 ). This work was also supported by Korea Institute of Police Technology (KIPoT) grant funded by the Korea Government ( KNPA ) (No. 092021C28S02000 ).