ARNOLD: A Benchmark for Language-Grounded Task Learning With Continuous
States in Realistic 3D Scenes
- URL: http://arxiv.org/abs/2304.04321v2
- Date: Mon, 11 Sep 2023 11:27:53 GMT
- Title: ARNOLD: A Benchmark for Language-Grounded Task Learning With Continuous
States in Realistic 3D Scenes
- Authors: Ran Gong, Jiangyong Huang, Yizhou Zhao, Haoran Geng, Xiaofeng Gao,
Qingyang Wu, Wensi Ai, Ziheng Zhou, Demetri Terzopoulos, Song-Chun Zhu,
Baoxiong Jia, Siyuan Huang
- Abstract summary: ARNOLD is a benchmark that evaluates language-grounded task learning with continuous states in realistic 3D scenes.
ARNOLD is comprised of 8 language-conditioned tasks that involve understanding object states and learning policies for continuous goals.
- Score: 72.83187997344406
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding the continuous states of objects is essential for task learning
and planning in the real world. However, most existing task learning benchmarks
assume discrete (e.g., binary) object goal states, which poses challenges for
the learning of complex tasks and transferring learned policy from simulated
environments to the real world. Furthermore, state discretization limits a
robot's ability to follow human instructions based on the grounding of actions
and states. To tackle these challenges, we present ARNOLD, a benchmark that
evaluates language-grounded task learning with continuous states in realistic
3D scenes. ARNOLD is comprised of 8 language-conditioned tasks that involve
understanding object states and learning policies for continuous goals. To
promote language-instructed learning, we provide expert demonstrations with
template-generated language descriptions. We assess task performance by
utilizing the latest language-conditioned policy learning models. Our results
indicate that current models for language-conditioned manipulations continue to
experience significant challenges in novel goal-state generalizations, scene
generalizations, and object generalizations. These findings highlight the need
to develop new algorithms that address this gap and underscore the potential
for further research in this area. Project website:
https://arnold-benchmark.github.io.
Related papers
- Towards Generalizable Vision-Language Robotic Manipulation: A Benchmark and LLM-guided 3D Policy [68.50785963043161]
GemBench is a novel benchmark to assess generalization capabilities of vision-language robotic manipulation policies.
We present 3D-LOTUS++, a framework that integrates 3D-LOTUS's motion planning capabilities with the task planning capabilities of LLMs.
3D-LOTUS++ achieves state-of-the-art performance on novel tasks of GemBench, setting a new standard for generalization in robotic manipulation.
arXiv Detail & Related papers (2024-10-02T09:02:34Z) - Scalable Language Model with Generalized Continual Learning [58.700439919096155]
The Joint Adaptive Re-ization (JARe) is integrated with Dynamic Task-related Knowledge Retrieval (DTKR) to enable adaptive adjustment of language models based on specific downstream tasks.
Our method demonstrates state-of-the-art performance on diverse backbones and benchmarks, achieving effective continual learning in both full-set and few-shot scenarios with minimal forgetting.
arXiv Detail & Related papers (2024-04-11T04:22:15Z) - Learning with Language-Guided State Abstractions [58.199148890064826]
Generalizable policy learning in high-dimensional observation spaces is facilitated by well-designed state representations.
Our method, LGA, uses a combination of natural language supervision and background knowledge from language models to automatically build state representations tailored to unseen tasks.
Experiments on simulated robotic tasks show that LGA yields state abstractions similar to those designed by humans, but in a fraction of the time.
arXiv Detail & Related papers (2024-02-28T23:57:04Z) - LanGWM: Language Grounded World Model [24.86620763902546]
We focus on learning language-grounded visual features to enhance the world model learning.
Our proposed technique of explicit language-grounded visual representation learning has the potential to improve models for human-robot interaction.
arXiv Detail & Related papers (2023-11-29T12:41:55Z) - Grounding Language with Visual Affordances over Unstructured Data [26.92329260907805]
We propose a novel approach to efficiently learn language-conditioned robot skills from unstructured, offline and reset-free data.
We exploit a self-supervised visuo-lingual affordance model, which requires as little as 1% of the total data with language.
We find that our method is capable of completing long-horizon, multi-tier tasks in the real world, while requiring an order of magnitude less data than previous approaches.
arXiv Detail & Related papers (2022-10-04T21:16:48Z) - Semantic Exploration from Language Abstractions and Pretrained
Representations [23.02024937564099]
Effective exploration is a challenge in reinforcement learning (RL)
We define novelty using semantically meaningful state abstractions.
We evaluate vision-language representations, pretrained on natural image captioning datasets.
arXiv Detail & Related papers (2022-04-08T17:08:00Z) - Pre-Trained Language Models for Interactive Decision-Making [72.77825666035203]
We describe a framework for imitation learning in which goals and observations are represented as a sequence of embeddings.
We demonstrate that this framework enables effective generalization across different environments.
For test tasks involving novel goals or novel scenes, initializing policies with language models improves task completion rates by 43.6%.
arXiv Detail & Related papers (2022-02-03T18:55:52Z) - Analyzing the Limits of Self-Supervision in Handling Bias in Language [52.26068057260399]
We evaluate how well language models capture the semantics of four tasks for bias: diagnosis, identification, extraction and rephrasing.
Our analyses indicate that language models are capable of performing these tasks to widely varying degrees across different bias dimensions, such as gender and political affiliation.
arXiv Detail & Related papers (2021-12-16T05:36:08Z) - CALVIN: A Benchmark for Language-conditioned Policy Learning for
Long-horizon Robot Manipulation Tasks [30.936692970187416]
General-purpose robots must learn to relate human language to their perceptions and actions.
We present CALVIN, an open-source simulated benchmark to learn long-horizon language-conditioned tasks.
arXiv Detail & Related papers (2021-12-06T18:37:33Z) - Inverse Reinforcement Learning with Natural Language Goals [8.972202854038382]
We propose a novel inverse reinforcement learning algorithm to learn a language-conditioned policy and reward function.
Our algorithm outperforms multiple baselines by a large margin on a vision-based natural language instruction following dataset.
arXiv Detail & Related papers (2020-08-16T14:43:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.