Publications
Detailed Information
Scaling up GANs for Text-to-Image Synthesis
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Kang, Minguk | - |
dc.contributor.author | Zhu, Jun-Yan | - |
dc.contributor.author | Zhang, Richard | - |
dc.contributor.author | Park, Jaesik | - |
dc.contributor.author | Shechtman, Eli | - |
dc.contributor.author | Paris, Sylvain | - |
dc.contributor.author | Park, Taesung | - |
dc.date.accessioned | 2024-05-08T07:28:38Z | - |
dc.date.available | 2024-05-08T07:28:38Z | - |
dc.date.created | 2024-05-08 | - |
dc.date.created | 2024-05-08 | - |
dc.date.issued | 2023 | - |
dc.identifier.citation | Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.10124-10134 | - |
dc.identifier.issn | 1063-6919 | - |
dc.identifier.uri | https://hdl.handle.net/10371/201221 | - |
dc.description.abstract | The recent success of text-to-image synthesis has taken the world by storm and captured the general public's imagination. From a technical standpoint, it also marked a drastic change in the favored architecture to design generative image models. GANs used to be the de facto choice, with techniques like StyleGAN. With DALL center dot E 2, autoregressive and diffusion models became the new standard for large-scale generative models overnight. This rapid shift raises a fundamental question: can we scale up GANs to benefit from large datasets like LAION? We find that naively increasing the capacity of the StyleGAN architecture quickly becomes unstable. We introduce GigaGAN, a new GAN architecture that far exceeds this limit, demonstrating GANs as a viable option for text-to-image synthesis. GigaGAN offers three major advantages. First, it is orders of magnitude faster at inference time, taking only 0.13 seconds to synthesize a 512px image. Second, it can synthesize high-resolution images, for example, 16-megapixel images in 3.66 seconds. Finally, GigaGAN supports various latent space editing applications such as latent interpolation, style mixing, and vector arithmetic operations. | - |
dc.language | 영어 | - |
dc.publisher | Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition | - |
dc.title | Scaling up GANs for Text-to-Image Synthesis | - |
dc.type | Article | - |
dc.identifier.doi | 10.1109/CVPR52729.2023.00976 | - |
dc.citation.journaltitle | Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition | - |
dc.identifier.wosid | 001062522102042 | - |
dc.citation.endpage | 10134 | - |
dc.citation.startpage | 10124 | - |
dc.description.isOpenAccess | Y | - |
dc.contributor.affiliatedAuthor | Park, Jaesik | - |
dc.type.docType | Proceedings Paper | - |
dc.description.journalClass | 1 | - |
- Appears in Collections:
- Files in This Item:
- There are no files associated with this item.
Item View & Download Count
Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.