This paper systematically reviews the application and limitations of computer vision (CV) tech- nologies in the maintenance of urban infrastructure. CV, a subfield of AI, has been increasingly utilized for tasks such as structural defect detection in bridges, crack and flood detection in tun- nels, real-time landslide monitoring on soft ground, and road surface condition assessment. The integration of deep learning algorithms and high-resolution imagery in these applications has significantly improved the efficiency and accuracy of infrastructure maintenance, contributing to enhanced safety and reduced operational costs. However, current research faces several chal- lenges, including the scarcity and variability of high-quality data, the high computational demands of processing complex deep learning models, and legal, ethical, and operational con- straints. To overcome these limitations and advance the application of CV technologies, future research should focus on standardizing methodologies, systematizing data collection and opera- tional conditions, continuously monitoring current trends and limitations, and proposing robust algorithms that can handle complex urban environments. By addressing these challenges, CV technologies can play a critical role in the development of smart, resilient, and sustainable urban infrastructure systems.