Image Augmentation Approaches for Building Dimension Estimation in Street View Images Using Object Detection and Instance Segmentation Based on Deep Learningoa mark
There are numerous applications for building dimension data, including building performance simulation and urban heat island investigations. In this context, object detection and instance segmentation methods—based on deep learning—are often used with Street View Images (SVIs) to estimate building dimensions. However, these methods typically depend on large and diverse datasets. Image augmentation can artificially boost dataset diversity, yet its role in building dimension estimation from SVIs remains under-studied. This research presents a methodology that applies eight distinct augmentation techniques—brightness, contrast, perspective, rotation, scale, shearing, translation augmentation, and a combined “sum of all” approach—to train models in two tasks: object detection with Faster Region-Based Convolutional Neural Networks (Faster R-CNNs) and instance segmentation with You Only Look Once (YOLO)v10. Comparing the performance with and without augmentation revealed that contrast augmentation consistently provided the greatest improvement in both bounding-box detection and instance segmentation. Using all augmentations at once rarely outperformed the single most effective method, and sometimes degraded the accuracy; shearing augmentation ranked as the second-best approach. Notably, the validation and test findings were closely aligned. These results, alongside the potential applications and the method’s current limitations, underscore the importance of carefully selected augmentations for reliable building dimension estimation.