Interactive Visual Task Learning for Robots
- URL: http://arxiv.org/abs/2312.13219v1
- Date: Wed, 20 Dec 2023 17:38:04 GMT
- Title: Interactive Visual Task Learning for Robots
- Authors: Weiwei Gu, Anant Sah, Nakul Gopalan
- Abstract summary: We present a framework for robots to learn novel visual concepts and tasks via in-situ linguistic interactions with human users.
We propose Hi-Viscont, which augments information of a novel concept to its parent nodes within a concept hierarchy.
We represent a visual task as a scene graph with language annotations, allowing us to create novel permutations of a demonstrated task zero-shot in-situ.
- Score: 4.114444605090135
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a framework for robots to learn novel visual concepts and tasks
via in-situ linguistic interactions with human users. Previous approaches have
either used large pre-trained visual models to infer novel objects zero-shot,
or added novel concepts along with their attributes and representations to a
concept hierarchy. We extend the approaches that focus on learning visual
concept hierarchies by enabling them to learn novel concepts and solve unseen
robotics tasks with them. To enable a visual concept learner to solve robotics
tasks one-shot, we developed two distinct techniques. Firstly, we propose a
novel approach, Hi-Viscont(HIerarchical VISual CONcept learner for Task), which
augments information of a novel concept to its parent nodes within a concept
hierarchy. This information propagation allows all concepts in a hierarchy to
update as novel concepts are taught in a continual learning setting. Secondly,
we represent a visual task as a scene graph with language annotations, allowing
us to create novel permutations of a demonstrated task zero-shot in-situ. We
present two sets of results. Firstly, we compare Hi-Viscont with the baseline
model (FALCON) on visual question answering(VQA) in three domains. While being
comparable to the baseline model on leaf level concepts, Hi-Viscont achieves an
improvement of over 9% on non-leaf concepts on average. We compare our model's
performance against the baseline FALCON model. Our framework achieves 33%
improvements in success rate metric, and 19% improvements in the object level
accuracy compared to the baseline model. With both of these results we
demonstrate the ability of our model to learn tasks and concepts in a continual
learning setting on the robot.
Related papers
- Theia: Distilling Diverse Vision Foundation Models for Robot Learning [6.709078873834651]
Theia is a vision foundation model for robot learning that distills multiple off-the-shelf vision foundation models trained on varied vision tasks.
Theia's rich visual representations encode diverse visual knowledge, enhancing downstream robot learning.
arXiv Detail & Related papers (2024-07-29T17:08:21Z) - Heuristic Vision Pre-Training with Self-Supervised and Supervised
Multi-Task Learning [0.0]
We propose a novel pre-training framework by adopting both self-supervised and supervised visual pre-text tasks in a multi-task manner.
Results show that our pre-trained models can deliver results on par with or better than state-of-the-art (SOTA) results on multiple visual tasks.
arXiv Detail & Related papers (2023-10-11T14:06:04Z) - Masked World Models for Visual Control [90.13638482124567]
We introduce a visual model-based RL framework that decouples visual representation learning and dynamics learning.
We demonstrate that our approach achieves state-of-the-art performance on a variety of visual robotic tasks.
arXiv Detail & Related papers (2022-06-28T18:42:27Z) - RelViT: Concept-guided Vision Transformer for Visual Relational
Reasoning [139.0548263507796]
We use vision transformers (ViTs) as our base model for visual reasoning.
We make better use of concepts defined as object entities and their relations to improve the reasoning ability of ViTs.
We show the resulting model, Concept-guided Vision Transformer (or RelViT for short), significantly outperforms prior approaches on HICO and GQA benchmarks.
arXiv Detail & Related papers (2022-04-24T02:46:43Z) - FALCON: Fast Visual Concept Learning by Integrating Images, Linguistic
descriptions, and Conceptual Relations [99.54048050189971]
We present a framework for learning new visual concepts quickly, guided by multiple naturally occurring data streams.
The learned concepts support downstream applications, such as answering questions by reasoning about unseen images.
We demonstrate the effectiveness of our model on both synthetic and real-world datasets.
arXiv Detail & Related papers (2022-03-30T19:45:00Z) - Zero Experience Required: Plug & Play Modular Transfer Learning for
Semantic Visual Navigation [97.17517060585875]
We present a unified approach to visual navigation using a novel modular transfer learning model.
Our model can effectively leverage its experience from one source task and apply it to multiple target tasks.
Our approach learns faster, generalizes better, and outperforms SoTA models by a significant margin.
arXiv Detail & Related papers (2022-02-05T00:07:21Z) - Model-Based Visual Planning with Self-Supervised Functional Distances [104.83979811803466]
We present a self-supervised method for model-based visual goal reaching.
Our approach learns entirely using offline, unlabeled data.
We find that this approach substantially outperforms both model-free and model-based prior methods.
arXiv Detail & Related papers (2020-12-30T23:59:09Z) - Bowtie Networks: Generative Modeling for Joint Few-Shot Recognition and
Novel-View Synthesis [39.53519330457627]
We propose a novel task of joint few-shot recognition and novel-view synthesis.
We aim to simultaneously learn an object classifier and generate images of that type of object from new viewpoints.
We focus on the interaction and cooperation between a generative model and a discriminative model.
arXiv Detail & Related papers (2020-08-16T19:40:56Z) - Concept Learners for Few-Shot Learning [76.08585517480807]
We propose COMET, a meta-learning method that improves generalization ability by learning to learn along human-interpretable concept dimensions.
We evaluate our model on few-shot tasks from diverse domains, including fine-grained image classification, document categorization and cell type annotation.
arXiv Detail & Related papers (2020-07-14T22:04:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.