Generative Modeling for Multi-task Visual Learning
- URL: http://arxiv.org/abs/2106.13409v1
- Date: Fri, 25 Jun 2021 03:42:59 GMT
- Title: Generative Modeling for Multi-task Visual Learning
- Authors: Zhipeng Bao, Martial Hebert, Yu-Xiong Wang
- Abstract summary: We consider a novel problem of learning a shared generative model that is useful across various visual perception tasks.
We propose a general multi-task oriented generative modeling framework, by coupling a discriminative multi-task network with a generative network.
Our framework consistently outperforms state-of-the-art multi-task approaches.
- Score: 40.96212750592383
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative modeling has recently shown great promise in computer vision, but
it has mostly focused on synthesizing visually realistic images. In this paper,
motivated by multi-task learning of shareable feature representations, we
consider a novel problem of learning a shared generative model that is useful
across various visual perception tasks. Correspondingly, we propose a general
multi-task oriented generative modeling (MGM) framework, by coupling a
discriminative multi-task network with a generative network. While it is
challenging to synthesize both RGB images and pixel-level annotations in
multi-task scenarios, our framework enables us to use synthesized images paired
with only weak annotations (i.e., image-level scene labels) to facilitate
multiple visual tasks. Experimental evaluation on challenging multi-task
benchmarks, including NYUv2 and Taskonomy, demonstrates that our MGM framework
improves the performance of all the tasks by large margins, consistently
outperforming state-of-the-art multi-task approaches.
Related papers
- Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model [83.85856356798531]
VistaLLM is a visual system that addresses coarse- and fine-grained vision-language tasks.
It employs a gradient-aware adaptive sampling technique to represent binary segmentation masks as sequences.
We also introduce a novel task, AttCoSeg, which boosts the model's reasoning and grounding capability over multiple input images.
arXiv Detail & Related papers (2023-12-19T18:53:01Z) - InstructCV: Instruction-Tuned Text-to-Image Diffusion Models as Vision Generalists [66.85125112199898]
We develop a unified language interface for computer vision tasks that abstracts away task-specific design choices.
Our model, dubbed InstructCV, performs competitively compared to other generalist and task-specific vision models.
arXiv Detail & Related papers (2023-09-30T14:26:43Z) - An Efficient General-Purpose Modular Vision Model via Multi-Task
Heterogeneous Training [79.78201886156513]
We present a model that can perform multiple vision tasks and can be adapted to other downstream tasks efficiently.
Our approach achieves comparable results to single-task state-of-the-art models and demonstrates strong generalization on downstream tasks.
arXiv Detail & Related papers (2023-06-29T17:59:57Z) - MulT: An End-to-End Multitask Learning Transformer [66.52419626048115]
We propose an end-to-end Multitask Learning Transformer framework, named MulT, to simultaneously learn multiple high-level vision tasks.
Our framework encodes the input image into a shared representation and makes predictions for each vision task using task-specific transformer-based decoder heads.
arXiv Detail & Related papers (2022-05-17T13:03:18Z) - Multi-View representation learning in Multi-Task Scene [4.509968166110557]
We propose a novel semi-supervised algorithm, termed as Multi-Task Multi-View learning based on Common and Special Features (MTMVCSF)
An anti-noise multi-task multi-view algorithm called AN-MTMVCSF is proposed, which has a strong adaptability to noise labels.
The effectiveness of these algorithms is proved by a series of well-designed experiments on both real world and synthetic data.
arXiv Detail & Related papers (2022-01-15T11:26:28Z) - Bowtie Networks: Generative Modeling for Joint Few-Shot Recognition and
Novel-View Synthesis [39.53519330457627]
We propose a novel task of joint few-shot recognition and novel-view synthesis.
We aim to simultaneously learn an object classifier and generate images of that type of object from new viewpoints.
We focus on the interaction and cooperation between a generative model and a discriminative model.
arXiv Detail & Related papers (2020-08-16T19:40:56Z) - Flexible Example-based Image Enhancement with Task Adaptive Global
Feature Self-Guided Network [162.14579019053804]
We show that our model outperforms the current state of the art in learning a single enhancement mapping.
The model achieves even higher performance on learning multiple mappings simultaneously.
arXiv Detail & Related papers (2020-05-13T22:45:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.