Towards General Purpose Vision Systems
- URL: http://arxiv.org/abs/2104.00743v1
- Date: Thu, 1 Apr 2021 19:35:21 GMT
- Title: Towards General Purpose Vision Systems
- Authors: Tanmay Gupta, Amita Kamath, Aniruddha Kembhavi and Derek Hoiem
- Abstract summary: We propose a task-agnostic vision-language system that accepts an image and a natural language task description and outputs bounding boxes, confidences, and text.
We evaluate the system's ability to learn multiple skills simultaneously, to perform tasks with novel skill-concept combinations, and to learn new skills efficiently and without forgetting.
- Score: 34.90633886653062
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A special purpose learning system assumes knowledge of admissible tasks at
design time. Adapting such a system to unforeseen tasks requires architecture
manipulation such as adding an output head for each new task or dataset. In
this work, we propose a task-agnostic vision-language system that accepts an
image and a natural language task description and outputs bounding boxes,
confidences, and text. The system supports a wide range of vision tasks such as
classification, localization, question answering, captioning, and more. We
evaluate the system's ability to learn multiple skills simultaneously, to
perform tasks with novel skill-concept combinations, and to learn new skills
efficiently and without forgetting.
Related papers
- Goal-Oriented Skill Abstraction for Offline Multi-Task Reinforcement Learning [25.18006424626525]
GO-Skill is a novel approach designed to extract and utilize reusable skills to enhance knowledge transfer and task performance.<n>Our approach uncovers reusable skills through a goal-oriented skill extraction process and leverages vector quantization to construct a discrete skill library.<n>We integrate these skills using hierarchical policy learning, enabling the construction of a high-level policy that dynamically orchestrates discrete skills to accomplish specific tasks.
arXiv Detail & Related papers (2025-07-09T07:54:49Z) - Learning reusable concepts across different egocentric video understanding tasks [12.709881592333995]
Hier-EgoPack is a framework able to create a collection of task perspectives that can be carried across downstream tasks.<n>In this paper, we introduce Hier-EgoPack, a unified framework able to create a collection of task perspectives that can be carried across downstream tasks and used as a potential source of additional insights.
arXiv Detail & Related papers (2025-05-30T15:14:46Z) - VL-GLUE: A Suite of Fundamental yet Challenging Visuo-Linguistic Reasoning Tasks [48.67062958311173]
VL-GLUE is a multitask benchmark for natural language understanding.
We show that this benchmark is quite challenging for existing large-scale vision-language models.
arXiv Detail & Related papers (2024-10-17T15:27:17Z) - Using Left and Right Brains Together: Towards Vision and Language
Planning [95.47128850991815]
We introduce a novel vision-language planning framework to perform concurrent visual and language planning for tasks with inputs of any form.
We evaluate the effectiveness of our framework across vision-language tasks, vision-only tasks, and language-only tasks.
arXiv Detail & Related papers (2024-02-16T09:46:20Z) - InstructCV: Instruction-Tuned Text-to-Image Diffusion Models as Vision Generalists [66.85125112199898]
We develop a unified language interface for computer vision tasks that abstracts away task-specific design choices.
Our model, dubbed InstructCV, performs competitively compared to other generalist and task-specific vision models.
arXiv Detail & Related papers (2023-09-30T14:26:43Z) - Task-Attentive Transformer Architecture for Continual Learning of
Vision-and-Language Tasks Using Knowledge Distillation [18.345183818638475]
Continual learning (CL) can serve as a remedy through enabling knowledge-transfer across sequentially arriving tasks.
We develop a transformer-based CL architecture for learning bimodal vision-and-language tasks.
Our approach is scalable learning to a large number of tasks because it requires little memory and time overhead.
arXiv Detail & Related papers (2023-03-25T10:16:53Z) - Transferring Knowledge for Reinforcement Learning in Contact-Rich
Manipulation [10.219833196479142]
We address the challenge of transferring knowledge within a family of similar tasks by leveraging multiple skill priors.
Our method learns a latent action space representing the skill embedding from demonstrated trajectories for each prior task.
We have evaluated our method on a set of peg-in-hole insertion tasks and demonstrate better generalization to new tasks that have never been encountered during training.
arXiv Detail & Related papers (2022-09-19T10:31:13Z) - MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale [103.7609761511652]
We show how a large-scale collective robotic learning system can acquire a repertoire of behaviors simultaneously.
New tasks can be continuously instantiated from previously learned tasks.
We train and evaluate our system on a set of 12 real-world tasks with data collected from 7 robots.
arXiv Detail & Related papers (2021-04-16T16:38:02Z) - Reasoning over Vision and Language: Exploring the Benefits of
Supplemental Knowledge [59.87823082513752]
This paper investigates the injection of knowledge from general-purpose knowledge bases (KBs) into vision-and-language transformers.
We empirically study the relevance of various KBs to multiple tasks and benchmarks.
The technique is model-agnostic and can expand the applicability of any vision-and-language transformer with minimal computational overhead.
arXiv Detail & Related papers (2021-01-15T08:37:55Z) - Context-Aware Multi-Task Learning for Traffic Scene Recognition in
Autonomous Vehicles [10.475998113861895]
We propose an algorithm to jointly learn the task-specific and shared representations by adopting a multi-task learning network.
Experiments on the large-scale dataset HSD demonstrate the effectiveness and superiority of our network over state-of-the-art methods.
arXiv Detail & Related papers (2020-04-03T03:09:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.