The entire process from manufacturing in a factory to on-site assembly is running sequentially in modular building construction. Therefore, an unexpected delay in factory manufacturing would impede the overall construction schedule. Hence, the implementation of appropriate progress monitoring is essential in modular building construction. In this study, a method of vision-based progress monitoring for a modular building factory has been developed. Instead of actual images of modular unit manufacturing, videos created from 3D modeling were used to train a deep learning model. Then, videos recorded during modular manufacturing in a factory were used to test the system. Although the deep learning model was trained with the virtual model, the test results demonstrated that all six processes were successfully detected. Out of 225 image frames on average, the number of unrecognized frames was 28-53, resulting in an average recognition rate of 83.1%. The recognition accuracy of the developed progress monitoring system ranges from 62.5 to 100%, and the average value was 84.4%.