Scaling up GANs for Text-to-Image Synthesis

Kang, Minguk; Zhu, Jun-Yan; Zhang, Richard; Park, Jaesik; Shechtman, Eli; Paris, Sylvain; Park, Taesung

doi:10.1109/CVPR52729.2023.00976

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Scaling up GANs for Text-to-Image Synthesis

DC Field	Value	Language
dc.contributor.author	Kang, Minguk	-
dc.contributor.author	Zhu, Jun-Yan	-
dc.contributor.author	Zhang, Richard	-
dc.contributor.author	Park, Jaesik	-
dc.contributor.author	Shechtman, Eli	-
dc.contributor.author	Paris, Sylvain	-
dc.contributor.author	Park, Taesung	-
dc.date.accessioned	2024-05-08T07:28:38Z	-
dc.date.available	2024-05-08T07:28:38Z	-
dc.date.created	2024-05-08	-
dc.date.created	2024-05-08	-
dc.date.issued	2023	-
dc.identifier.citation	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.10124-10134	-
dc.identifier.issn	1063-6919	-
dc.identifier.uri	https://hdl.handle.net/10371/201221	-
dc.description.abstract	The recent success of text-to-image synthesis has taken the world by storm and captured the general public's imagination. From a technical standpoint, it also marked a drastic change in the favored architecture to design generative image models. GANs used to be the de facto choice, with techniques like StyleGAN. With DALL center dot E 2, autoregressive and diffusion models became the new standard for large-scale generative models overnight. This rapid shift raises a fundamental question: can we scale up GANs to benefit from large datasets like LAION? We find that naively increasing the capacity of the StyleGAN architecture quickly becomes unstable. We introduce GigaGAN, a new GAN architecture that far exceeds this limit, demonstrating GANs as a viable option for text-to-image synthesis. GigaGAN offers three major advantages. First, it is orders of magnitude faster at inference time, taking only 0.13 seconds to synthesize a 512px image. Second, it can synthesize high-resolution images, for example, 16-megapixel images in 3.66 seconds. Finally, GigaGAN supports various latent space editing applications such as latent interpolation, style mixing, and vector arithmetic operations.	-
dc.language	영어	-
dc.publisher	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition	-
dc.title	Scaling up GANs for Text-to-Image Synthesis	-
dc.type	Article	-
dc.identifier.doi	10.1109/CVPR52729.2023.00976	-
dc.citation.journaltitle	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition	-
dc.identifier.wosid	001062522102042	-
dc.citation.endpage	10134	-
dc.citation.startpage	10124	-
dc.description.isOpenAccess	Y	-
dc.contributor.affiliatedAuthor	Park, Jaesik	-
dc.type.docType	Proceedings Paper	-
dc.description.journalClass	1	-