Scalable GANs with Transformers
- URL: http://arxiv.org/abs/2509.24935v1
- Date: Mon, 29 Sep 2025 15:36:15 GMT
- Title: Scalable GANs with Transformers
- Authors: Sangeek Hyun, MinKyu Lee, Jae-Pil Heo,
- Abstract summary: Scalability has driven recent advances in generative modeling, yet its principles remain underexplored for adversarial learning.<n>We investigate the scalability of Generative Adversarial Networks (GANs) through two design choices.<n>We find issues as underutilization of early layers in the generator and optimization instability as the network scales.
- Score: 41.13613492946196
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Scalability has driven recent advances in generative modeling, yet its principles remain underexplored for adversarial learning. We investigate the scalability of Generative Adversarial Networks (GANs) through two design choices that have proven to be effective in other types of generative models: training in a compact Variational Autoencoder latent space and adopting purely transformer-based generators and discriminators. Training in latent space enables efficient computation while preserving perceptual fidelity, and this efficiency pairs naturally with plain transformers, whose performance scales with computational budget. Building on these choices, we analyze failure modes that emerge when naively scaling GANs. Specifically, we find issues as underutilization of early layers in the generator and optimization instability as the network scales. Accordingly, we provide simple and scale-friendly solutions as lightweight intermediate supervision and width-aware learning-rate adjustment. Our experiments show that GAT, a purely transformer-based and latent-space GANs, can be easily trained reliably across a wide range of capacities (S through XL). Moreover, GAT-XL/2 achieves state-of-the-art single-step, class-conditional generation performance (FID of 2.96) on ImageNet-256 in just 40 epochs, 6x fewer epochs than strong baselines.
Related papers
- Large Language Models Inference Engines based on Spiking Neural Networks [5.529385616266398]
We explore spiking neural networks (SNNs) to design transformer models.<n>A challenge in training large-scale SNNs is inefficient and time-consuming.<n>We propose NeurTransformer, a methodology for designing transformer-based SNN for inference.
arXiv Detail & Related papers (2025-09-30T18:11:13Z) - Accelerating Transformers in Online RL [47.99822253865053]
transformer-based models in Reinforcement Learning (RL)<n>We propose a method that uses the Accelerator policy as a transformer's trainer.<n>We show that applying our algorithm not only enables stable training of transformers but also reduces training time on image-based environments by up to a factor of two.
arXiv Detail & Related papers (2025-09-30T11:57:14Z) - Chain-of-Thought Enhanced Shallow Transformers for Wireless Symbol Detection [14.363929799618283]
We propose CHain Of thOught Symbol dEtection (CHOOSE), a CoT-enhanced shallow Transformer framework for wireless symbol detection.<n>By introducing autoregressive latent reasoning steps within the hidden space, CHOOSE significantly improves the reasoning capacity of shallow models.<n> Experimental results demonstrate that our approach outperforms conventional shallow Transformers and achieves performance comparable to that of deep Transformers.
arXiv Detail & Related papers (2025-06-26T08:41:45Z) - Shrinking the Giant : Quasi-Weightless Transformers for Low Energy Inference [0.30104001512119216]
Building models with fast and energy-efficient inference is imperative to enable a variety of transformer-based applications.
We build on an approach for learning LUT networks directly via an Extended Finite Difference method.
This allows for a computational and energy-efficient inference solution for transformer-based models.
arXiv Detail & Related papers (2024-11-04T05:38:56Z) - Kolmogorov-Arnold Transformer [72.88137795439407]
We introduce the Kolmogorov-Arnold Transformer (KAT), a novel architecture that replaces layers with Kolmogorov-Arnold Network (KAN) layers.
We identify three key challenges: (C1) Base function, (C2) Inefficiency, and (C3) Weight.
With these designs, KAT outperforms traditional-based transformers.
arXiv Detail & Related papers (2024-09-16T17:54:51Z) - Efficient generative adversarial networks using linear additive-attention Transformers [0.8287206589886879]
We present a novel GAN architecture based on a linear attention Transformer block named Ladaformer.<n>LadaGAN consistently outperforms existing convolutional and Transformer GANs on benchmark datasets at different resolutions.<n>LadaGAN shows competitive performance compared to state-of-the-art multi-step generative models.
arXiv Detail & Related papers (2024-01-17T21:08:41Z) - The Nuts and Bolts of Adopting Transformer in GANs [124.30856952272913]
We investigate the properties of Transformer in the generative adversarial network (GAN) framework for high-fidelity image synthesis.
Our study leads to a new alternative design of Transformers in GAN, a convolutional neural network (CNN)-free generator termed as STrans-G.
arXiv Detail & Related papers (2021-10-25T17:01:29Z) - Improved Transformer for High-Resolution GANs [69.42469272015481]
We introduce two key ingredients to Transformer to address this challenge.
We show in the experiments that the proposed HiT achieves state-of-the-art FID scores of 31.87 and 2.95 on unconditional ImageNet $128 times 128$ and FFHQ $256 times 256$, respectively.
arXiv Detail & Related papers (2021-06-14T17:39:49Z) - Efficient pre-training objectives for Transformers [84.64393460397471]
We study several efficient pre-training objectives for Transformers-based models.
We prove that eliminating the MASK token and considering the whole output during the loss are essential choices to improve performance.
arXiv Detail & Related papers (2021-04-20T00:09:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.