How far can we go with ImageNet for Text-to-Image generation?
- URL: http://arxiv.org/abs/2502.21318v1
- Date: Fri, 28 Feb 2025 18:59:42 GMT
- Title: How far can we go with ImageNet for Text-to-Image generation?
- Authors: L. Degeorge, A. Ghosh, N. Dufour, D. Picard, V. Kalogeiton,
- Abstract summary: Recent text-to-image (T2I) generation models have achieved remarkable results by training on billion-scale datasets.<n>We challenge this established paradigm by demonstrating that strategic data augmentation of small, well-curated datasets can match or outperform models trained on massive web-scraped collections.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent text-to-image (T2I) generation models have achieved remarkable results by training on billion-scale datasets, following a `bigger is better' paradigm that prioritizes data quantity over quality. We challenge this established paradigm by demonstrating that strategic data augmentation of small, well-curated datasets can match or outperform models trained on massive web-scraped collections. Using only ImageNet enhanced with well-designed text and image augmentations, we achieve a +2 overall score over SD-XL on GenEval and +5 on DPGBench while using just 1/10th the parameters and 1/1000th the training images. Our results suggest that strategic data augmentation, rather than massive datasets, could offer a more sustainable path forward for T2I generation.
Related papers
- ImageGen-CoT: Enhancing Text-to-Image In-context Learning with Chain-of-Thought Reasoning [89.19449553099747]
We study the problem of Text-to-Image In-Context Learning (T2I-ICL)
We propose a framework that incorporates a thought process called ImageGen-CoT prior to image generation.
We fine-tune MLLMs using this dataset to enhance their contextual reasoning capabilities.
arXiv Detail & Related papers (2025-03-25T03:18:46Z) - SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training [77.681908636429]
Text-to-image (T2I) models face several limitations, including large model sizes, slow, and low-quality generation on mobile devices.
This paper aims to develop an extremely small and fast T2I model that generates high-resolution and high-quality images on mobile platforms.
arXiv Detail & Related papers (2024-12-12T18:59:53Z) - Data Extrapolation for Text-to-image Generation on Small Datasets [3.7356387436951146]
We propose a new data augmentation method for text-to-image generation using linear extrapolation.
We construct training samples dozens of times larger than the original dataset.
Our model achieves FID scores of 7.91, 9.52 and 5.00 on the CUB, Oxford and COCO datasets.
arXiv Detail & Related papers (2024-10-02T15:08:47Z) - Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation [58.09421301921607]
We construct the first large-scale dataset for subject-driven image editing and generation.
Our dataset is 5 times the size of previous largest dataset, yet our cost is tens of thousands of GPU hours lower.
arXiv Detail & Related papers (2024-06-13T16:40:39Z) - xT: Nested Tokenization for Larger Context in Large Images [79.37673340393475]
xT is a framework for vision transformers which aggregates global context with local details.
We are able to increase accuracy by up to 8.6% on challenging classification tasks.
arXiv Detail & Related papers (2024-03-04T10:29:58Z) - Large-scale Dataset Pruning with Dynamic Uncertainty [28.60845105174658]
The state of the art of many learning tasks, e.g., image classification, is advanced by collecting larger datasets and then training larger models on them.
In this paper, we investigate how to prune the large-scale datasets, and thus produce an informative subset for training sophisticated deep models with negligible performance drop.
arXiv Detail & Related papers (2023-06-08T13:14:35Z) - Scaling Up Dataset Distillation to ImageNet-1K with Constant Memory [66.035487142452]
We show that trajectory-matching-based methods (MTT) can scale to large-scale datasets such as ImageNet-1K.
We propose a procedure to exactly compute the unrolled gradient with constant memory complexity, which allows us to scale MTT to ImageNet-1K seamlessly with 6x reduction in memory footprint.
The resulting algorithm sets new SOTA on ImageNet-1K: we can scale up to 50 IPCs (Image Per Class) on ImageNet-1K on a single GPU.
arXiv Detail & Related papers (2022-11-19T04:46:03Z) - BigDatasetGAN: Synthesizing ImageNet with Pixel-wise Annotations [89.42397034542189]
We synthesize a large labeled dataset via a generative adversarial network (GAN)
We take image samples from the class-conditional generative model BigGAN trained on ImageNet, and manually annotate 5 images per class, for all 1k classes.
We create a new ImageNet benchmark by labeling an additional set of 8k real images and evaluate segmentation performance in a variety of settings.
arXiv Detail & Related papers (2022-01-12T20:28:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.