Visual Prompting via Image Inpainting
- URL: http://arxiv.org/abs/2209.00647v1
- Date: Thu, 1 Sep 2022 17:59:33 GMT
- Title: Visual Prompting via Image Inpainting
- Authors: Amir Bar, Yossi Gandelsman, Trevor Darrell, Amir Globerson, Alexei A.
Efros
- Abstract summary: Inspired by prompting in NLP, this paper investigates visual prompting: given input-output image example(s) of a new task at test time and a new input image.
We apply visual prompting to pretrained models and demonstrate results on various downstream image-to-image tasks.
- Score: 104.98602202198668
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: How does one adapt a pre-trained visual model to novel downstream tasks
without task-specific finetuning or any model modification? Inspired by
prompting in NLP, this paper investigates visual prompting: given input-output
image example(s) of a new task at test time and a new input image, the goal is
to automatically produce the output image, consistent with the given examples.
We show that posing this problem as simple image inpainting - literally just
filling in a hole in a concatenated visual prompt image - turns out to be
surprisingly effective, provided that the inpainting algorithm has been trained
on the right data. We train masked auto-encoders on a new dataset that we
curated - 88k unlabeled figures from academic papers sources on Arxiv. We apply
visual prompting to these pretrained models and demonstrate results on various
downstream image-to-image tasks, including foreground segmentation, single
object detection, colorization, edge detection, etc.
Related papers
- IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks [124.90137528319273]
In this paper, we present IMProv, a generative model that is able to in-context learn visual tasks from multimodal prompts.
We train a masked generative transformer on a new dataset of figures from computer vision papers and their associated captions.
During inference time, we prompt the model with text and/or image task example(s) and have the model inpaint the corresponding output.
arXiv Detail & Related papers (2023-12-04T09:48:29Z) - Supervised Deep Learning for Content-Aware Image Retargeting with
Fourier Convolutions [11.031841470875571]
Image aims to alter the size of the image with attention to the contents.
Labeled datasets are unavailable for training deep learning models in the image tasks.
Regular convolutional neural networks cannot generate images of different sizes in inference time.
arXiv Detail & Related papers (2023-06-12T19:17:44Z) - iEdit: Localised Text-guided Image Editing with Weak Supervision [53.082196061014734]
We propose a novel learning method for text-guided image editing.
It generates images conditioned on a source image and a textual edit prompt.
It shows favourable results against its counterparts in terms of image fidelity, CLIP alignment score and qualitatively for editing both generated and real images.
arXiv Detail & Related papers (2023-05-10T07:39:14Z) - Images Speak in Images: A Generalist Painter for In-Context Visual
Learning [98.78475432114595]
In-context learning allows the model to rapidly adapt to various tasks with only a handful of prompts and examples.
It is unclear how to define the general-purpose task prompts that the vision model can understand and transfer to out-of-domain tasks.
We present Painter, a generalist model which redefines the output of core vision tasks as images, and specify task prompts as also images.
arXiv Detail & Related papers (2022-12-05T18:59:50Z) - ClipCrop: Conditioned Cropping Driven by Vision-Language Model [90.95403416150724]
We take advantage of vision-language models as a foundation for creating robust and user-intentional cropping algorithms.
We develop a method to perform cropping with a text or image query that reflects the user's intention as guidance.
Our pipeline design allows the model to learn text-conditioned aesthetic cropping with a small dataset.
arXiv Detail & Related papers (2022-11-21T14:27:07Z) - Visual Prompting: Modifying Pixel Space to Adapt Pre-trained Models [29.413887954758053]
We introduce visual prompting, which learns a task-specific image perturbation such that a frozen pre-trained model prompted with this perturbation performs a new task.
We discover that changing only a few pixels is enough to adapt models to new tasks and datasets, and performs on par with linear probing.
arXiv Detail & Related papers (2022-03-31T17:59:30Z) - Restore from Restored: Single-image Inpainting [9.699531255678856]
We present a novel and efficient self-supervised fine-tuning algorithm for inpainting networks.
We update the parameters of the pre-trained inpainting networks by utilizing existing self-similar patches.
We achieve state-of-the-art inpainting results on publicly available benchmark datasets.
arXiv Detail & Related papers (2021-10-25T11:38:51Z) - Learning to Generate Scene Graph from Natural Language Supervision [52.18175340725455]
We propose one of the first methods that learn from image-sentence pairs to extract a graphical representation of localized objects and their relationships within an image, known as scene graph.
We leverage an off-the-shelf object detector to identify and localize object instances, match labels of detected regions to concepts parsed from captions, and thus create "pseudo" labels for learning scene graph.
arXiv Detail & Related papers (2021-09-06T03:38:52Z) - Restore from Restored: Single-image Inpainting [9.699531255678856]
We present a novel and efficient self-supervised fine-tuning algorithm for inpainting networks.
We upgrade the parameters of the pretrained networks by utilizing existing self-similar patches within the given input image.
We achieve state-of-the-art inpainting results on publicly available benchmark datasets.
arXiv Detail & Related papers (2021-02-16T10:59:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.