Counting Guidance for High Fidelity Text-to-Image Synthesis

Kang, Wonjun; Galim, Kevin; Il Koo, Hyung; Cho, Nam Ik

DC Field	Value	Language
dc.contributor.author	Kang, Wonjun	-
dc.contributor.author	Galim, Kevin	-
dc.contributor.author	Il Koo, Hyung	-
dc.contributor.author	Cho, Nam Ik	-
dc.date.issued	2025-01-01	-
dc.identifier.uri	https://aurora.ajou.ac.kr/handle/2018.oak/38564	-
dc.identifier.uri	https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=105003642863&origin=inward	-
dc.description.abstract	Recently, there have been significant improvements in the quality and performance of text-to-image generation, largely due to the impressive results attained by diffusion models. However, text-to-image diffusion models sometimes struggle to create high-fidelity content for the given input prompt. One specific issue is their difficulty in generating the precise number of objects specified in the text prompt. For example, when provided with the prompt 'five apples and ten lemons on a table,' images generated by diffusion models often contain an incorrect number of objects. In this paper, we present a method to improve diffusion models so that they accurately produce the correct object count based on the input prompt. We adopt a counting network that performs reference-less class-agnostic counting for any given image. We calculate the gradients of the counting network and refine the predicted noise for each step. To address the presence of multiple types of objects in the prompt, we utilize novel attention map guidance to obtain high-quality masks for each object. Finally, we guide the denoising process using the calculated gradients for each object. Through extensive experiments and evaluation, we demonstrate that the proposed method significantly enhances the fidelity of diffusion models with respect to object count.	-
dc.description.sponsorship	This work was supported in part by Institute of Information & communications Technology Planning & Evaluation (IITP) under the Artificial Intelligence Convergence Innovation Human Resources Development (IITP-2024-RS-2023-00255968) grant funded by the Korea government (MSIT) ITRC and by the MSIT (Ministry of Science and ICT), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2024-2020-0-01461) supervised by the IITP (Institute for Information & communications Technology Planning & Evaluation).	-
dc.language.iso	eng	-
dc.publisher	Institute of Electrical and Electronics Engineers Inc.	-
dc.subject.mesh	Counting networks	-
dc.subject.mesh	Diffusion model	-
dc.subject.mesh	Generative model	-
dc.subject.mesh	High quality	-
dc.subject.mesh	High-fidelity	-
dc.subject.mesh	Image diffusion	-
dc.subject.mesh	Image generations	-
dc.subject.mesh	Images synthesis	-
dc.subject.mesh	Performance	-
dc.subject.mesh	Text-to-image generation	-
dc.title	Counting Guidance for High Fidelity Text-to-Image Synthesis	-
dc.type	Conference	-
dc.citation.conferenceDate	2025.02.28.~2025.03.04.	-
dc.citation.conferenceName	2025 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2025	-
dc.citation.edition	Proceedings - 2025 IEEE Winter Conference on Applications of Computer Vision, WACV 2025	-
dc.citation.endPage	908	-
dc.citation.startPage	899	-
dc.citation.title	Proceedings - 2025 IEEE Winter Conference on Applications of Computer Vision, WACV 2025	-
dc.identifier.bibliographicCitation	Proceedings - 2025 IEEE Winter Conference on Applications of Computer Vision, WACV 2025, pp.899-908	-
dc.identifier.doi	10.1109/wacv61041.2025.00097	-
dc.identifier.scopusid	2-s2.0-105003642863	-
dc.identifier.url	http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=10943266	-
dc.subject.keyword	diffusion models	-
dc.subject.keyword	generative models	-
dc.subject.keyword	text-to-image generation	-
dc.type.other	Conference Paper	-
dc.description.isoa	false	-
dc.subject.subarea	Artificial Intelligence	-
dc.subject.subarea	Computer Science Applications	-
dc.subject.subarea	Computer Vision and Pattern Recognition	-
dc.subject.subarea	Human-Computer Interaction	-
dc.subject.subarea	Modeling and Simulation	-
dc.subject.subarea	Radiology, Nuclear Medicine and Imaging	-

Show simple item record

qrcode

트윗하기

Related Researcher

KOO, HYUNG IL구형일: Department of Electrical and Computer Engineering

File Download

There are no files associated with this item.

Related Researcher

Total Views & Downloads

File Download