Exposing and Addressing Cross-Task Inconsistency in Unified
Vision-Language Models
- URL: http://arxiv.org/abs/2303.16133v2
- Date: Wed, 21 Feb 2024 22:53:33 GMT
- Title: Exposing and Addressing Cross-Task Inconsistency in Unified
Vision-Language Models
- Authors: Adyasha Maharana, Amita Kamath, Christopher Clark, Mohit Bansal,
Aniruddha Kembhavi
- Abstract summary: Inconsistent AI models are considered brittle and untrustworthy by human users.
We find that state-of-the-art vision-language models suffer from a surprisingly high degree of inconsistent behavior across tasks.
We propose a rank correlation-based auxiliary training objective, computed over large automatically created cross-task contrast sets.
- Score: 80.23791222509644
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As general purpose vision models get increasingly effective at a wide set of
tasks, it is imperative that they be consistent across the tasks they support.
Inconsistent AI models are considered brittle and untrustworthy by human users
and are more challenging to incorporate into larger systems that take
dependencies on their outputs. Measuring consistency between very heterogeneous
tasks that might include outputs in different modalities is challenging since
it is difficult to determine if the predictions are consistent with one
another. As a solution, we introduce a benchmark dataset, CocoCon, where we
create contrast sets by modifying test instances for multiple tasks in small
but semantically meaningful ways to change the gold label and outline metrics
for measuring if a model is consistent by ranking the original and perturbed
instances across tasks. We find that state-of-the-art vision-language models
suffer from a surprisingly high degree of inconsistent behavior across tasks,
especially for more heterogeneous tasks. To alleviate this issue, we propose a
rank correlation-based auxiliary training objective, computed over large
automatically created cross-task contrast sets, that improves the multi-task
consistency of large unified models while retaining their original accuracy on
downstream tasks.
Related papers
- Improving General Text Embedding Model: Tackling Task Conflict and Data Imbalance through Model Merging [33.23758947497205]
Advanced embedding models are typically developed using large-scale multi-task data and joint training across multiple tasks.
To overcome these challenges, we explore model merging-a technique that combines independently trained models to mitigate gradient conflicts and balance data distribution.
We introduce a novel method, Self Positioning, which efficiently searches for optimal model combinations within the space of task vectors using gradient descent.
arXiv Detail & Related papers (2024-10-19T08:39:21Z) - Comprehensive Generative Replay for Task-Incremental Segmentation with Concurrent Appearance and Semantic Forgetting [49.87694319431288]
Generalist segmentation models are increasingly favored for diverse tasks involving various objects from different image sources.
We propose a Comprehensive Generative (CGR) framework that restores appearance and semantic knowledge by synthesizing image-mask pairs.
Experiments on incremental tasks (cardiac, fundus and prostate segmentation) show its clear advantage for alleviating concurrent appearance and semantic forgetting.
arXiv Detail & Related papers (2024-06-28T10:05:58Z) - Task Indicating Transformer for Task-conditional Dense Predictions [16.92067246179703]
We introduce a novel task-conditional framework called Task Indicating Transformer (TIT) to tackle this challenge.
Our approach designs a Mix Task Adapter module within the transformer block, which incorporates a Task Indicating Matrix through matrix decomposition.
We also propose a Task Gate Decoder module that harnesses a Task Indicating Vector and gating mechanism to facilitate adaptive multi-scale feature refinement.
arXiv Detail & Related papers (2024-03-01T07:06:57Z) - Task-Distributionally Robust Data-Free Meta-Learning [99.56612787882334]
Data-Free Meta-Learning (DFML) aims to efficiently learn new tasks by leveraging multiple pre-trained models without requiring their original training data.
For the first time, we reveal two major challenges hindering their practical deployments: Task-Distribution Shift ( TDS) and Task-Distribution Corruption (TDC)
arXiv Detail & Related papers (2023-11-23T15:46:54Z) - An Efficient General-Purpose Modular Vision Model via Multi-Task
Heterogeneous Training [79.78201886156513]
We present a model that can perform multiple vision tasks and can be adapted to other downstream tasks efficiently.
Our approach achieves comparable results to single-task state-of-the-art models and demonstrates strong generalization on downstream tasks.
arXiv Detail & Related papers (2023-06-29T17:59:57Z) - Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and
Vision-Language Tasks [86.66733026149892]
We propose Uni-Perceiver v2, which is the first generalist model capable of handling major large-scale vision and vision-gnostic tasks.
Specifically, images are encoded as general region proposals, while texts are encoded via a Transformer-based language model.
Uni-Perceiver v2 achieves competitive performance on a broad range of vision and vision-language tasks.
arXiv Detail & Related papers (2022-11-17T18:59:52Z) - Robust Learning Through Cross-Task Consistency [92.42534246652062]
We propose a broadly applicable and fully computational method for augmenting learning with Cross-Task Consistency.
We observe that learning with cross-task consistency leads to more accurate predictions and better generalization to out-of-distribution inputs.
arXiv Detail & Related papers (2020-06-07T09:24:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.