Task Bias in Vision-Language Models
- URL: http://arxiv.org/abs/2212.04412v1
- Date: Thu, 8 Dec 2022 17:10:31 GMT
- Title: Task Bias in Vision-Language Models
- Authors: Sachit Menon, Ishaan Preetam Chandratreya, Carl Vondrick
- Abstract summary: We explore the CLIP model and show that its visual representation is often strongly biased towards solving some tasks more than others.
To resolve this task bias, we show how to learn a visual prompt that guides the representation towards features relevant to their task of interest.
- Score: 18.025004053980545
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Incidental supervision from language has become a popular approach for
learning generic visual representations that can be prompted to perform many
recognition tasks in computer vision. We conduct an in-depth exploration of the
CLIP model and show that its visual representation is often strongly biased
towards solving some tasks more than others. Moreover, which task the
representation will be biased towards is unpredictable, with little consistency
across images. To resolve this task bias, we show how to learn a visual prompt
that guides the representation towards features relevant to their task of
interest. Our results show that these visual prompts can be independent of the
input image and still effectively provide a conditioning mechanism to steer
visual representations towards the desired task.
Related papers
- Unifying Image Processing as Visual Prompting Question Answering [62.84955983910612]
Image processing is a fundamental task in computer vision, which aims at enhancing image quality and extracting essential features for subsequent vision applications.
Traditionally, task-specific models are developed for individual tasks and designing such models requires distinct expertise.
We propose a universal model for general image processing that covers image restoration, image enhancement, image feature extraction tasks.
arXiv Detail & Related papers (2023-10-16T15:32:57Z) - InstructCV: Instruction-Tuned Text-to-Image Diffusion Models as Vision Generalists [66.85125112199898]
We develop a unified language interface for computer vision tasks that abstracts away task-specific design choices.
Our model, dubbed InstructCV, performs competitively compared to other generalist and task-specific vision models.
arXiv Detail & Related papers (2023-09-30T14:26:43Z) - Tuning computer vision models with task rewards [88.45787930908102]
Misalignment between model predictions and intended usage can be detrimental for the deployment of computer vision models.
In natural language processing, this is often addressed using reinforcement learning techniques that align models with a task reward.
We adopt this approach and show its surprising effectiveness across multiple computer vision tasks, such as object detection, panoptic segmentation, colorization and image captioning.
arXiv Detail & Related papers (2023-02-16T11:49:48Z) - Images Speak in Images: A Generalist Painter for In-Context Visual
Learning [98.78475432114595]
In-context learning allows the model to rapidly adapt to various tasks with only a handful of prompts and examples.
It is unclear how to define the general-purpose task prompts that the vision model can understand and transfer to out-of-domain tasks.
We present Painter, a generalist model which redefines the output of core vision tasks as images, and specify task prompts as also images.
arXiv Detail & Related papers (2022-12-05T18:59:50Z) - GAMR: A Guided Attention Model for (visual) Reasoning [7.919213739992465]
Humans continue to outperform modern AI systems in their ability to flexibly parse and understand complex visual scenes.
We present a novel module for visual reasoning, the Guided Attention Model for (visual) Reasoning (GAMR)
GAMR posits that the brain solves complex visual reasoning problems dynamically via sequences of attention shifts to select and route task-relevant visual information into memory.
arXiv Detail & Related papers (2022-06-10T07:52:06Z) - Learning Task Informed Abstractions [10.920599910769276]
We propose learning Task Informed Abstractions (TIA) that explicitly separates reward-correlated visual features from distractors.
TIA leads to significant performance gains over state-of-the-art methods on many visual control tasks.
arXiv Detail & Related papers (2021-06-29T17:56:11Z) - Learning Visual Representations with Caption Annotations [19.24013129952071]
We propose a proxy task to learn visual representations over image-caption pairs.
ICMLM consists in predicting masked words in captions by relying on visual cues.
Our experiments confirm that image captions can be leveraged to inject global and localized semantic information into visual representations.
arXiv Detail & Related papers (2020-08-04T08:04:16Z) - Self-supervised Learning from a Multi-view Perspective [121.63655399591681]
We show that self-supervised representations can extract task-relevant information and discard task-irrelevant information.
Our theoretical framework paves the way to a larger space of self-supervised learning objective design.
arXiv Detail & Related papers (2020-06-10T00:21:35Z) - Exploit Clues from Views: Self-Supervised and Regularized Learning for
Multiview Object Recognition [66.87417785210772]
This work investigates the problem of multiview self-supervised learning (MV-SSL)
A novel surrogate task for self-supervised learning is proposed by pursuing "object invariant" representation.
Experiments shows that the recognition and retrieval results using view invariant prototype embedding (VISPE) outperform other self-supervised learning methods.
arXiv Detail & Related papers (2020-03-28T07:06:06Z) - Analyzing Visual Representations in Embodied Navigation Tasks [45.35107294831313]
We use the recently proposed projection weighted Canonical Correlation Analysis (PWCCA) to measure the similarity of visual representations learned in the same environment by performing different tasks.
We then empirically demonstrate that visual representations learned on one task can be effectively transferred to a different task.
arXiv Detail & Related papers (2020-03-12T19:43:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.