Zero-Shot Text-to-Image Generation
- URL: http://arxiv.org/abs/2102.12092v1
- Date: Wed, 24 Feb 2021 06:42:31 GMT
- Title: Zero-Shot Text-to-Image Generation
- Authors: Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss,
Alec Radford, Mark Chen, Ilya Sutskever
- Abstract summary: We describe a transformer that autoregressively models the text and image tokens as a single stream of data.
With sufficient data and scale, our approach is competitive with previous domain-specific models when evaluated in a zero-shot fashion.
- Score: 15.135825501365007
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text-to-image generation has traditionally focused on finding better modeling
assumptions for training on a fixed dataset. These assumptions might involve
complex architectures, auxiliary losses, or side information such as object
part labels or segmentation masks supplied during training. We describe a
simple approach for this task based on a transformer that autoregressively
models the text and image tokens as a single stream of data. With sufficient
data and scale, our approach is competitive with previous domain-specific
models when evaluated in a zero-shot fashion.
Related papers
- Reinforcing Pre-trained Models Using Counterfactual Images [54.26310919385808]
This paper proposes a novel framework to reinforce classification models using language-guided generated counterfactual images.
We identify model weaknesses by testing the model using the counterfactual image dataset.
We employ the counterfactual images as an augmented dataset to fine-tune and reinforce the classification model.
arXiv Detail & Related papers (2024-06-19T08:07:14Z) - Adapt Anything: Tailor Any Image Classifiers across Domains And
Categories Using Text-to-Image Diffusion Models [82.95591765009105]
We aim to study if a modern text-to-image diffusion model can tailor any task-adaptive image classifier across domains and categories.
We utilize only one off-the-shelf text-to-image model to synthesize images with category labels derived from the corresponding text prompts.
arXiv Detail & Related papers (2023-10-25T11:58:14Z) - Shatter and Gather: Learning Referring Image Segmentation with Text
Supervision [52.46081425504072]
We present a new model that discovers semantic entities in input image and then combines such entities relevant to text query to predict the mask of the referent.
Our method was evaluated on four public benchmarks for referring image segmentation, where it clearly outperformed the existing method for the same task and recent open-vocabulary segmentation models on all the benchmarks.
arXiv Detail & Related papers (2023-08-29T15:39:15Z) - Evaluating Data Attribution for Text-to-Image Models [62.844382063780365]
We evaluate attribution through "customization" methods, which tune an existing large-scale model toward a given exemplar object or style.
Our key insight is that this allows us to efficiently create synthetic images that are computationally influenced by the exemplar by construction.
By taking into account the inherent uncertainty of the problem, we can assign soft attribution scores over a set of training images.
arXiv Detail & Related papers (2023-06-15T17:59:51Z) - ClipCrop: Conditioned Cropping Driven by Vision-Language Model [90.95403416150724]
We take advantage of vision-language models as a foundation for creating robust and user-intentional cropping algorithms.
We develop a method to perform cropping with a text or image query that reflects the user's intention as guidance.
Our pipeline design allows the model to learn text-conditioned aesthetic cropping with a small dataset.
arXiv Detail & Related papers (2022-11-21T14:27:07Z) - Prefix Conditioning Unifies Language and Label Supervision [84.11127588805138]
We show that dataset biases negatively affect pre-training by reducing the generalizability of learned representations.
In experiments, we show that this simple technique improves the performance in zero-shot image recognition accuracy and robustness to the image-level distribution shift.
arXiv Detail & Related papers (2022-06-02T16:12:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.