Related papers: Towards General Purpose Vision Systems

Towards General Purpose Vision Systems

URL: http://arxiv.org/abs/2104.00743v1
Date: Thu, 1 Apr 2021 19:35:21 GMT
Title: Towards General Purpose Vision Systems
Authors: Tanmay Gupta, Amita Kamath, Aniruddha Kembhavi and Derek Hoiem
Abstract summary: We propose a task-agnostic vision-language system that accepts an image and a natural language task description and outputs bounding boxes, confidences, and text. We evaluate the system's ability to learn multiple skills simultaneously, to perform tasks with novel skill-concept combinations, and to learn new skills efficiently and without forgetting.
Score: 34.90633886653062
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A special purpose learning system assumes knowledge of admissible tasks at design time. Adapting such a system to unforeseen tasks requires architecture manipulation such as adding an output head for each new task or dataset. In this work, we propose a task-agnostic vision-language system that accepts an image and a natural language task description and outputs bounding boxes, confidences, and text. The system supports a wide range of vision tasks such as classification, localization, question answering, captioning, and more. We evaluate the system's ability to learn multiple skills simultaneously, to perform tasks with novel skill-concept combinations, and to learn new skills efficiently and without forgetting.

Related papers

Goal-Oriented Skill Abstraction for Offline Multi-Task Reinforcement Learning [25.18006424626525]
GO-Skill is a novel approach designed to extract and utilize reusable skills to enhance knowledge transfer and task performance.<n>Our approach uncovers reusable skills through a goal-oriented skill extraction process and leverages vector quantization to construct a discrete skill library.<n>We integrate these skills using hierarchical policy learning, enabling the construction of a high-level policy that dynamically orchestrates discrete skills to accomplish specific tasks.
arXiv Detail & Related papers (2025-07-09T07:54:49Z)
Learning reusable concepts across different egocentric video understanding tasks [12.709881592333995]
Hier-EgoPack is a framework able to create a collection of task perspectives that can be carried across downstream tasks.<n>In this paper, we introduce Hier-EgoPack, a unified framework able to create a collection of task perspectives that can be carried across downstream tasks and used as a potential source of additional insights.
arXiv Detail & Related papers (2025-05-30T15:14:46Z)
VL-GLUE: A Suite of Fundamental yet Challenging Visuo-Linguistic Reasoning Tasks [48.67062958311173]
VL-GLUE is a multitask benchmark for natural language understanding. We show that this benchmark is quite challenging for existing large-scale vision-language models.
arXiv Detail & Related papers (2024-10-17T15:27:17Z)
Using Left and Right Brains Together: Towards Vision and Language Planning [95.47128850991815]
We introduce a novel vision-language planning framework to perform concurrent visual and language planning for tasks with inputs of any form. We evaluate the effectiveness of our framework across vision-language tasks, vision-only tasks, and language-only tasks.
arXiv Detail & Related papers (2024-02-16T09:46:20Z)
InstructCV: Instruction-Tuned Text-to-Image Diffusion Models as Vision Generalists [66.85125112199898]
We develop a unified language interface for computer vision tasks that abstracts away task-specific design choices. Our model, dubbed InstructCV, performs competitively compared to other generalist and task-specific vision models.
arXiv Detail & Related papers (2023-09-30T14:26:43Z)
Task-Attentive Transformer Architecture for Continual Learning of Vision-and-Language Tasks Using Knowledge Distillation [18.345183818638475]
Continual learning (CL) can serve as a remedy through enabling knowledge-transfer across sequentially arriving tasks. We develop a transformer-based CL architecture for learning bimodal vision-and-language tasks. Our approach is scalable learning to a large number of tasks because it requires little memory and time overhead.
arXiv Detail & Related papers (2023-03-25T10:16:53Z)
Transferring Knowledge for Reinforcement Learning in Contact-Rich Manipulation [10.219833196479142]
We address the challenge of transferring knowledge within a family of similar tasks by leveraging multiple skill priors. Our method learns a latent action space representing the skill embedding from demonstrated trajectories for each prior task. We have evaluated our method on a set of peg-in-hole insertion tasks and demonstrate better generalization to new tasks that have never been encountered during training.
arXiv Detail & Related papers (2022-09-19T10:31:13Z)
MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale [103.7609761511652]
We show how a large-scale collective robotic learning system can acquire a repertoire of behaviors simultaneously. New tasks can be continuously instantiated from previously learned tasks. We train and evaluate our system on a set of 12 real-world tasks with data collected from 7 robots.
arXiv Detail & Related papers (2021-04-16T16:38:02Z)
Reasoning over Vision and Language: Exploring the Benefits of Supplemental Knowledge [59.87823082513752]
This paper investigates the injection of knowledge from general-purpose knowledge bases (KBs) into vision-and-language transformers. We empirically study the relevance of various KBs to multiple tasks and benchmarks. The technique is model-agnostic and can expand the applicability of any vision-and-language transformer with minimal computational overhead.
arXiv Detail & Related papers (2021-01-15T08:37:55Z)
Context-Aware Multi-Task Learning for Traffic Scene Recognition in Autonomous Vehicles [10.475998113861895]
We propose an algorithm to jointly learn the task-specific and shared representations by adopting a multi-task learning network. Experiments on the large-scale dataset HSD demonstrate the effectiveness and superiority of our network over state-of-the-art methods.
arXiv Detail & Related papers (2020-04-03T03:09:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.