Related papers: ELEMENTAL: Interactive Learning from Demonstrations and Vision-Language Models for Reward Design in Robotics

ELEMENTAL: Interactive Learning from Demonstrations and Vision-Language Models for Reward Design in Robotics

URL: http://arxiv.org/abs/2411.18825v2
Date: Thu, 05 Dec 2024 16:27:08 GMT
Title: ELEMENTAL: Interactive Learning from Demonstrations and Vision-Language Models for Reward Design in Robotics
Authors: Letian Chen, Matthew Gombolay,
Abstract summary: We propose a novel framework that combines natural language guidance with visual user demonstrations to align robot behavior with user intentions better.<n>Our experiment results demonstrate that ELEMENTAL outperforms prior work by 42.3% on task success, and achieves 41.3% better generalization in out-of-distribution tasks.
Score: 1.4579344926652846
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reinforcement learning (RL) has demonstrated compelling performance in robotic tasks, but its success often hinges on the design of complex, ad hoc reward functions. Researchers have explored how Large Language Models (LLMs) could enable non-expert users to specify reward functions more easily. However, LLMs struggle to balance the importance of different features, generalize poorly to out-of-distribution robotic tasks, and cannot represent the problem properly with only text-based descriptions. To address these challenges, we propose ELEMENTAL (intEractive LEarning froM dEmoNstraTion And Language), a novel framework that combines natural language guidance with visual user demonstrations to align robot behavior with user intentions better. By incorporating visual inputs, ELEMENTAL overcomes the limitations of text-only task specifications, while leveraging inverse reinforcement learning (IRL) to balance feature weights and match the demonstrated behaviors optimally. ELEMENTAL also introduces an iterative feedback-loop through self-reflection to improve feature, reward, and policy learning. Our experiment results demonstrate that ELEMENTAL outperforms prior work by 42.3% on task success, and achieves 41.3% better generalization in out-of-distribution tasks, highlighting its robustness in LfD.

Related papers

GROVE: A Generalized Reward for Learning Open-Vocabulary Physical Skill [44.95563610228887]
Learning open-vocabulary physical skills for simulated agents presents a significant challenge in artificial intelligence. We introduce GROVE, a generalized reward framework that enables open-vocabulary physical skill learning without manual engineering or task-specific demonstrations. To bridge the domain gap between simulation and natural images, we develop Pose2CLIP, a lightweight mapper that efficiently projects agent poses directly into semantic feature space.
arXiv Detail & Related papers (2025-04-05T14:44:47Z)
Towards Autonomous Reinforcement Learning for Real-World Robotic Manipulation with Large Language Models [5.2364456910271935]
Reinforcement Learning (RL) enables agents to autonomously optimize complex behaviors through interaction and reward signals. In this work, we propose an unsupervised pipeline leveraging GPT-4, a pre-trained LLM, to generate reward functions directly from natural language task descriptions. The rewards are used to train RL agents in simulated environments, where we formalize the reward generation process to enhance feasibility.
arXiv Detail & Related papers (2025-03-06T10:08:44Z)
Affordance-Guided Reinforcement Learning via Visual Prompting [51.361977466993345]
Keypoint-based Affordance Guidance for Improvements (KAGI) is a method leveraging rewards shaped by vision-language models (VLMs) for autonomous RL. On real-world manipulation tasks specified by natural language descriptions, KAGI improves the sample efficiency of autonomous RL and enables successful task completion in 20K online fine-tuning steps.
arXiv Detail & Related papers (2024-07-14T21:41:29Z)
RILe: Reinforced Imitation Learning [60.63173816209543]
RILe (Reinforced Learning) is a framework that combines the strengths of imitation learning and inverse reinforcement learning to learn a dense reward function efficiently. Our framework produces high-performing policies in high-dimensional tasks where direct imitation fails to replicate complex behaviors.
arXiv Detail & Related papers (2024-06-12T17:56:31Z)
DETAIL: Task DEmonsTration Attribution for Interpretable In-context Learning [75.68193159293425]
In-context learning (ICL) allows transformer-based language models to learn a specific task with a few "task demonstrations" without updating their parameters. We propose an influence function-based attribution technique, DETAIL, that addresses the specific characteristics of ICL. We experimentally prove the wide applicability of DETAIL by showing our attribution scores obtained on white-box models are transferable to black-box models in improving model performance.
arXiv Detail & Related papers (2024-05-22T15:52:52Z)
INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning [59.07490387145391]
Large language models (LLMs) have demonstrated impressive capabilities in various natural language processing tasks. Their application to information retrieval (IR) tasks is still challenging due to the infrequent occurrence of many IR-specific concepts in natural language. We introduce a novel instruction tuning dataset, INTERS, encompassing 20 tasks across three fundamental IR categories.
arXiv Detail & Related papers (2024-01-12T12:10:28Z)
Accelerating Reinforcement Learning of Robotic Manipulations via Feedback from Large Language Models [21.052532074815765]
We introduce the Lafite-RL (Language agent feedback interactive Reinforcement Learning) framework. It enables RL agents to learn robotic tasks efficiently by taking advantage of Large Language Models' timely feedback. It outperforms the baseline in terms of both learning efficiency and success rate.
arXiv Detail & Related papers (2023-11-04T11:21:38Z)
Learning Reward for Physical Skills using Large Language Model [5.795405764196473]
Large Language Models contain valuable task-related knowledge that can aid in learning reward functions. We aim to extract task knowledge from LLMs using environment feedback to create efficient reward functions for physical skills.
arXiv Detail & Related papers (2023-10-21T19:10:06Z)
Self-Refined Large Language Model as Automated Reward Function Designer for Deep Reinforcement Learning in Robotics [14.773498542408264]
Large Language Models (LLMs) have been extensively adopted to address tasks demanding in-depth common-sense knowledge. We propose in this work a novel LLM framework with a self-refinement mechanism for automated reward function design.
arXiv Detail & Related papers (2023-09-13T02:56:56Z)
AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot Manipulation [50.737355245505334]
We propose a novel framework for learning high-level cognitive capabilities in robot manipulation tasks. The resulting dataset AlphaBlock consists of 35 comprehensive high-level tasks of multi-step text plans and paired observation.
arXiv Detail & Related papers (2023-05-30T09:54:20Z)
What Makes Good In-context Demonstrations for Code Intelligence Tasks with LLMs? [60.668318972782295]
Large language models have shown the ability of in-context learning (ICL) ICL employs task instructions and a few examples as demonstrations, and then inputs the demonstrations to the language models for making predictions. It is important to systematically investigate how to construct a good demonstration for code-related tasks.
arXiv Detail & Related papers (2023-04-15T15:13:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.