Is This Loss Informative? Faster Text-to-Image Customization by Tracking
Objective Dynamics
- URL: http://arxiv.org/abs/2302.04841v3
- Date: Wed, 1 Nov 2023 17:57:50 GMT
- Title: Is This Loss Informative? Faster Text-to-Image Customization by Tracking
Objective Dynamics
- Authors: Anton Voronov, Mikhail Khoroshikh, Artem Babenko, Max Ryabinin
- Abstract summary: We study the training dynamics of popular text-to-image personalization methods, aiming to speed them up.
We propose a simple drop-in early stopping criterion that only requires computing the regular training objective on a fixed set of inputs.
Our experiments on Stable Diffusion for 48 different concepts and three personalization methods demonstrate the competitive performance of our approach.
- Score: 31.15864240403093
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-to-image generation models represent the next step of evolution in image
synthesis, offering a natural way to achieve flexible yet fine-grained control
over the result. One emerging area of research is the fast adaptation of large
text-to-image models to smaller datasets or new visual concepts. However, many
efficient methods of adaptation have a long training time, which limits their
practical applications, slows down experiments, and spends excessive GPU
resources. In this work, we study the training dynamics of popular
text-to-image personalization methods (such as Textual Inversion or
DreamBooth), aiming to speed them up. We observe that most concepts are learned
at early stages and do not improve in quality later, but standard training
convergence metrics fail to indicate that. Instead, we propose a simple drop-in
early stopping criterion that only requires computing the regular training
objective on a fixed set of inputs for all training iterations. Our experiments
on Stable Diffusion for 48 different concepts and three personalization methods
demonstrate the competitive performance of our approach, which makes adaptation
up to 8 times faster with no significant drops in quality.
Related papers
- Enhancing pretraining efficiency for medical image segmentation via transferability metrics [0.0]
In medical image segmentation tasks, the scarcity of labeled training data poses a significant challenge.
We introduce a novel transferability metric, based on contrastive learning, that measures how robustly a pretrained model is able to represent the target data.
arXiv Detail & Related papers (2024-10-24T12:11:52Z) - Efficient Training with Denoised Neural Weights [65.14892033932895]
This work takes a novel step towards building a weight generator to synthesize the neural weights for initialization.
We use the image-to-image translation task with generative adversarial networks (GANs) as an example due to the ease of collecting model weights.
By initializing the image translation model with the denoised weights predicted by our diffusion model, the training requires only 43.3 seconds.
arXiv Detail & Related papers (2024-07-16T17:59:42Z) - EfficientTrain++: Generalized Curriculum Learning for Efficient Visual Backbone Training [79.96741042766524]
We reformulate the training curriculum as a soft-selection function.
We show that exposing the contents of natural images can be readily achieved by the intensity of data augmentation.
The resulting method, EfficientTrain++, is simple, general, yet surprisingly effective.
arXiv Detail & Related papers (2024-05-14T17:00:43Z) - E$^{2}$GAN: Efficient Training of Efficient GANs for Image-to-Image Translation [69.72194342962615]
We introduce and address a novel research direction: can the process of distilling GANs from diffusion models be made significantly more efficient?
First, we construct a base GAN model with generalized features, adaptable to different concepts through fine-tuning, eliminating the need for training from scratch.
Second, we identify crucial layers within the base GAN model and employ Low-Rank Adaptation (LoRA) with a simple yet effective rank search process, rather than fine-tuning the entire base model.
Third, we investigate the minimal amount of data necessary for fine-tuning, further reducing the overall training time.
arXiv Detail & Related papers (2024-01-11T18:59:14Z) - Class Incremental Learning with Pre-trained Vision-Language Models [59.15538370859431]
We propose an approach to exploiting pre-trained vision-language models (e.g. CLIP) that enables further adaptation.
Experiments on several conventional benchmarks consistently show a significant margin of improvement over the current state-of-the-art.
arXiv Detail & Related papers (2023-10-31T10:45:03Z) - Efficient-3DiM: Learning a Generalizable Single-image Novel-view
Synthesizer in One Day [63.96075838322437]
We propose a framework to learn a single-image novel-view synthesizer.
Our framework is able to reduce the total training time from 10 days to less than 1 day.
arXiv Detail & Related papers (2023-10-04T17:57:07Z) - EfficientTrain: Exploring Generalized Curriculum Learning for Training
Visual Backbones [80.662250618795]
This paper presents a new curriculum learning approach for the efficient training of visual backbones (e.g., vision Transformers)
As an off-the-shelf method, it reduces the wall-time training cost of a wide variety of popular models by >1.5x on ImageNet-1K/22K without sacrificing accuracy.
arXiv Detail & Related papers (2022-11-17T17:38:55Z) - Improved Techniques for Training Single-Image GANs [44.251222212306764]
generative models can be learned from a single image, as opposed to from a large dataset.
We propose some best practices to train a model capable of generating realistic images from only a single sample.
Our model is up to six times faster to train, has fewer parameters, and can better capture the global structure of images.
arXiv Detail & Related papers (2020-03-25T17:33:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.