Unified Open-Vocabulary Dense Visual Prediction
- URL: http://arxiv.org/abs/2307.08238v2
- Date: Fri, 18 Aug 2023 04:35:06 GMT
- Title: Unified Open-Vocabulary Dense Visual Prediction
- Authors: Hengcan Shi, Munawar Hayat, Jianfei Cai
- Abstract summary: Open-vocabulary (OV) dense visual prediction has attracted increasing research attention.
Most of existing approaches are task-specific and individually tackle each task.
We propose a Unified Open-Vocabulary Network (UOVN) to jointly address four common dense prediction tasks.
- Score: 51.03014432235629
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, open-vocabulary (OV) dense visual prediction (such as OV
object detection, semantic, instance and panoptic segmentations) has attracted
increasing research attention. However, most of existing approaches are
task-specific and individually tackle each task. In this paper, we propose a
Unified Open-Vocabulary Network (UOVN) to jointly address four common dense
prediction tasks. Compared with separate models, a unified network is more
desirable for diverse industrial applications. Moreover, OV dense prediction
training data is relatively less. Separate networks can only leverage
task-relevant training data, while a unified approach can integrate diverse
training data to boost individual tasks. We address two major challenges in
unified OV prediction. Firstly, unlike unified methods for fixed-set
predictions, OV networks are usually trained with multi-modal data. Therefore,
we propose a multi-modal, multi-scale and multi-task (MMM) decoding mechanism
to better leverage multi-modal data. Secondly, because UOVN uses data from
different tasks for training, there are significant domain and task gaps. We
present a UOVN training mechanism to reduce such gaps. Experiments on four
datasets demonstrate the effectiveness of our UOVN.
Related papers
- A Multitask Deep Learning Model for Classification and Regression of Hyperspectral Images: Application to the large-scale dataset [44.94304541427113]
We propose a multitask deep learning model to perform multiple classification and regression tasks simultaneously on hyperspectral images.
We validated our approach on a large hyperspectral dataset called TAIGA.
A comprehensive qualitative and quantitative analysis of the results shows that the proposed method significantly outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-23T11:14:54Z) - Diffusion Model is an Effective Planner and Data Synthesizer for
Multi-Task Reinforcement Learning [101.66860222415512]
Multi-Task Diffusion Model (textscMTDiff) is a diffusion-based method that incorporates Transformer backbones and prompt learning for generative planning and data synthesis.
For generative planning, we find textscMTDiff outperforms state-of-the-art algorithms across 50 tasks on Meta-World and 8 maps on Maze2D.
arXiv Detail & Related papers (2023-05-29T05:20:38Z) - Unified Demonstration Retriever for In-Context Learning [56.06473069923567]
Unified Demonstration Retriever (textbfUDR) is a single model to retrieve demonstrations for a wide range of tasks.
We propose a multi-task list-wise ranking training framework, with an iterative mining strategy to find high-quality candidates.
Experiments on 30+ tasks across 13 task families and multiple data domains show that UDR significantly outperforms baselines.
arXiv Detail & Related papers (2023-05-07T16:07:11Z) - Uncertainty-Aware Meta-Learning for Multimodal Task Distributions [3.7470451129384825]
We present UnLiMiTD (uncertainty-aware meta-learning for multimodal task distributions)
We take a probabilistic perspective and train a parametric, tuneable distribution over tasks on the meta-dataset.
We demonstrate that UnLiMiTD's predictions compare favorably to, and outperform in most cases, the standard baselines.
arXiv Detail & Related papers (2022-10-04T20:02:25Z) - GPPF: A General Perception Pre-training Framework via Sparsely Activated
Multi-Task Learning [23.15735672234869]
We propose GPPF, a General Perception Pre-training Framework, to pre-train a task-level dynamic network.
By inspecting humans' innate ability to learn in complex environment, we recognize and transfer three critical elements to deep networks.
We develop a plug-and-play multi-task training algorithm, which supports Single Iteration Multiple Tasks (SIMT) concurrently training.
arXiv Detail & Related papers (2022-08-03T15:34:35Z) - Self-Supervised Graph Neural Network for Multi-Source Domain Adaptation [51.21190751266442]
Domain adaptation (DA) tries to tackle the scenarios when the test data does not fully follow the same distribution of the training data.
By learning from large-scale unlabeled samples, self-supervised learning has now become a new trend in deep learning.
We propose a novel textbfSelf-textbfSupervised textbfGraph Neural Network (SSG) to enable more effective inter-task information exchange and knowledge sharing.
arXiv Detail & Related papers (2022-04-08T03:37:56Z) - Multi-Task Learning for Dense Prediction Tasks: A Survey [87.66280582034838]
Multi-task learning (MTL) techniques have shown promising results w.r.t. performance, computations and/or memory footprint.
We provide a well-rounded view on state-of-the-art deep learning approaches for MTL in computer vision.
arXiv Detail & Related papers (2020-04-28T09:15:50Z) - Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction.
We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data.
Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.