Publications

Detailed Information

Scaling up GANs for Text-to-Image Synthesis

DC Field Value Language
dc.contributor.authorKang, Minguk-
dc.contributor.authorZhu, Jun-Yan-
dc.contributor.authorZhang, Richard-
dc.contributor.authorPark, Jaesik-
dc.contributor.authorShechtman, Eli-
dc.contributor.authorParis, Sylvain-
dc.contributor.authorPark, Taesung-
dc.date.accessioned2024-05-08T07:28:38Z-
dc.date.available2024-05-08T07:28:38Z-
dc.date.created2024-05-08-
dc.date.created2024-05-08-
dc.date.issued2023-
dc.identifier.citationProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.10124-10134-
dc.identifier.issn1063-6919-
dc.identifier.urihttps://hdl.handle.net/10371/201221-
dc.description.abstractThe recent success of text-to-image synthesis has taken the world by storm and captured the general public's imagination. From a technical standpoint, it also marked a drastic change in the favored architecture to design generative image models. GANs used to be the de facto choice, with techniques like StyleGAN. With DALL center dot E 2, autoregressive and diffusion models became the new standard for large-scale generative models overnight. This rapid shift raises a fundamental question: can we scale up GANs to benefit from large datasets like LAION? We find that naively increasing the capacity of the StyleGAN architecture quickly becomes unstable. We introduce GigaGAN, a new GAN architecture that far exceeds this limit, demonstrating GANs as a viable option for text-to-image synthesis. GigaGAN offers three major advantages. First, it is orders of magnitude faster at inference time, taking only 0.13 seconds to synthesize a 512px image. Second, it can synthesize high-resolution images, for example, 16-megapixel images in 3.66 seconds. Finally, GigaGAN supports various latent space editing applications such as latent interpolation, style mixing, and vector arithmetic operations.-
dc.language영어-
dc.publisherProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition-
dc.titleScaling up GANs for Text-to-Image Synthesis-
dc.typeArticle-
dc.identifier.doi10.1109/CVPR52729.2023.00976-
dc.citation.journaltitleProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition-
dc.identifier.wosid001062522102042-
dc.citation.endpage10134-
dc.citation.startpage10124-
dc.description.isOpenAccessY-
dc.contributor.affiliatedAuthorPark, Jaesik-
dc.type.docTypeProceedings Paper-
dc.description.journalClass1-
Appears in Collections:
Files in This Item:
There are no files associated with this item.

Related Researcher

  • College of Engineering
  • Dept. of Computer Science and Engineering
Research Area Computer Graphics, Computer Vision, Machine Learning, Robotics

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share