Publications

Detailed Information

Scaling up GANs for Text-to-Image Synthesis

Cited 29 time in Web of Science Cited 0 time in Scopus
Authors

Kang, Minguk; Zhu, Jun-Yan; Zhang, Richard; Park, Jaesik; Shechtman, Eli; Paris, Sylvain; Park, Taesung

Issue Date
2023
Publisher
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Citation
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.10124-10134
Abstract
The recent success of text-to-image synthesis has taken the world by storm and captured the general public's imagination. From a technical standpoint, it also marked a drastic change in the favored architecture to design generative image models. GANs used to be the de facto choice, with techniques like StyleGAN. With DALL center dot E 2, autoregressive and diffusion models became the new standard for large-scale generative models overnight. This rapid shift raises a fundamental question: can we scale up GANs to benefit from large datasets like LAION? We find that naively increasing the capacity of the StyleGAN architecture quickly becomes unstable. We introduce GigaGAN, a new GAN architecture that far exceeds this limit, demonstrating GANs as a viable option for text-to-image synthesis. GigaGAN offers three major advantages. First, it is orders of magnitude faster at inference time, taking only 0.13 seconds to synthesize a 512px image. Second, it can synthesize high-resolution images, for example, 16-megapixel images in 3.66 seconds. Finally, GigaGAN supports various latent space editing applications such as latent interpolation, style mixing, and vector arithmetic operations.
ISSN
1063-6919
URI
https://hdl.handle.net/10371/201221
DOI
https://doi.org/10.1109/CVPR52729.2023.00976
Files in This Item:
There are no files associated with this item.
Appears in Collections:

Related Researcher

  • College of Engineering
  • Dept. of Computer Science and Engineering
Research Area Computer Graphics, Computer Vision, Machine Learning, Robotics

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share