Recently, semantic segmentation methods leveraging image generation models have garnered significant attention. In particular, approach based on Diffusion models (DDPM) that utilize mid-level activations from the diffusion network with a majority voting of distributions from several light multi-layer perceptron (MLP) have shown better performance compared to GAN-based approaches. However, utilizing a simple majority voting system is suboptimal. In this paper, we propose a novel voting method for DDPM-based semantic segmentation. Our method introduces a weighted sum of distributions, where the weights are determined by the entropy of the class prediction results obtained from each MLP model. We conduct experiments on various datasets, including LSUN-Bedroom, FFHQ-256, LSUN-Cat, and LSUN-Horse. The results demonstrate that our proposed method achieves better mean Intersection over Union (mIoU) scores compared to previous work.
This work has been supported by the MSIT(Ministry of Science and ICT), Korea, under the ITRC(Information Technology Research Center) support program(IITP-2023-2018-0-01424) supervised by the IITP(Institute for Information communications Technology Promotion).