Related papers: Real-World Offline Reinforcement Learning from Vision Language Model Feedback

Real-World Offline Reinforcement Learning from Vision Language Model Feedback

URL: http://arxiv.org/abs/2411.05273v1
Date: Fri, 08 Nov 2024 02:12:34 GMT
Title: Real-World Offline Reinforcement Learning from Vision Language Model Feedback
Authors: Sreyas Venkataraman, Yufei Wang, Ziyu Wang, Zackory Erickson, David Held,
Abstract summary: offline reinforcement learning can enable policy learning from pre-collected, sub-optimal datasets without online interactions. Most existing offline RL works assume the dataset is already labeled with the task rewards. We propose a novel system that automatically generates reward labels for offline datasets.
Score: 19.494335952082466
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Offline reinforcement learning can enable policy learning from pre-collected, sub-optimal datasets without online interactions. This makes it ideal for real-world robots and safety-critical scenarios, where collecting online data or expert demonstrations is slow, costly, and risky. However, most existing offline RL works assume the dataset is already labeled with the task rewards, a process that often requires significant human effort, especially when ground-truth states are hard to ascertain (e.g., in the real-world). In this paper, we build on prior work, specifically RL-VLM-F, and propose a novel system that automatically generates reward labels for offline datasets using preference feedback from a vision-language model and a text description of the task. Our method then learns a policy using offline RL with the reward-labeled dataset. We demonstrate the system's applicability to a complex real-world robot-assisted dressing task, where we first learn a reward function using a vision-language model on a sub-optimal offline dataset, and then we use the learned reward to employ Implicit Q learning to develop an effective dressing policy. Our method also performs well in simulation tasks involving the manipulation of rigid and deformable objects, and significantly outperform baselines such as behavior cloning and inverse RL. In summary, we propose a new system that enables automatic reward labeling and policy learning from unlabeled, sub-optimal offline datasets.

Related papers

Policy Learning from Large Vision-Language Model Feedback without Reward Modeling [19.48826538310603]
We introduce PLARE, a novel approach that leverages large vision-language models (VLMs) to provide guidance signals for agent training.<n>Instead of relying on manually designed reward functions, PLARE queries a VLM for preference labels on pairs of visual trajectory segments.<n>The policy is then trained directly from these preference labels using a supervised contrastive preference learning objective.
arXiv Detail & Related papers (2025-07-31T10:07:49Z)
MOTO: Offline Pre-training to Online Fine-tuning for Model-based Robot Learning [52.101643259906915]
We study the problem of offline pre-training and online fine-tuning for reinforcement learning from high-dimensional observations. Existing model-based offline RL methods are not suitable for offline-to-online fine-tuning in high-dimensional domains. We propose an on-policy model-based method that can efficiently reuse prior data through model-based value expansion and policy regularization.
arXiv Detail & Related papers (2024-01-06T21:04:31Z)
Finetuning Offline World Models in the Real World [13.46766121896684]
Reinforcement Learning (RL) is notoriously data-inefficient, which makes training on a real robot difficult. offline RL has been proposed as a framework for training RL policies on pre-existing datasets without any online interaction. In this work, we consider the problem of pretraining a world model with offline data collected on a real robot, and then finetuning the model on online data collected by planning with the learned model.
arXiv Detail & Related papers (2023-10-24T17:46:12Z)
Leveraging Optimal Transport for Enhanced Offline Reinforcement Learning in Surgical Robotic Environments [4.2569494803130565]
We introduce an innovative algorithm designed to assign rewards to offline trajectories, using a small number of high-quality expert demonstrations. This approach circumvents the need for handcrafted rewards, unlocking the potential to harness vast datasets for policy learning.
arXiv Detail & Related papers (2023-10-13T03:39:15Z)
Model-Based Reinforcement Learning with Multi-Task Offline Pretraining [59.82457030180094]
We present a model-based RL method that learns to transfer potentially useful dynamics and action demonstrations from offline data to a novel task. The main idea is to use the world models not only as simulators for behavior learning but also as tools to measure the task relevance. We demonstrate the advantages of our approach compared with the state-of-the-art methods in Meta-World and DeepMind Control Suite.
arXiv Detail & Related papers (2023-06-06T02:24:41Z)
Adaptive Policy Learning for Offline-to-Online Reinforcement Learning [27.80266207283246]
We consider an offline-to-online setting where the agent is first learned from the offline dataset and then trained online. We propose a framework called Adaptive Policy Learning for effectively taking advantage of offline and online data.
arXiv Detail & Related papers (2023-03-14T08:13:21Z)
Benchmarks and Algorithms for Offline Preference-Based Reward Learning [41.676208473752425]
We propose an approach that uses an offline dataset to craft preference queries via pool-based active learning. Our proposed approach does not require actual physical rollouts or an accurate simulator for either the reward learning or policy optimization steps.
arXiv Detail & Related papers (2023-01-03T23:52:16Z)
A Workflow for Offline Model-Free Robotic Reinforcement Learning [117.07743713715291]
offline reinforcement learning (RL) enables learning control policies by utilizing only prior experience, without any online interaction. We develop a practical workflow for using offline RL analogous to the relatively well-understood for supervised learning problems. We demonstrate the efficacy of this workflow in producing effective policies without any online tuning.
arXiv Detail & Related papers (2021-09-22T16:03:29Z)
Offline Meta-Reinforcement Learning with Online Self-Supervision [66.42016534065276]
We propose a hybrid offline meta-RL algorithm, which uses offline data with rewards to meta-train an adaptive policy. Our method uses the offline data to learn the distribution of reward functions, which is then sampled to self-supervise reward labels for the additional online data. We find that using additional data and self-generated rewards significantly improves an agent's ability to generalize.
arXiv Detail & Related papers (2021-07-08T17:01:32Z)
Representation Matters: Offline Pretraining for Sequential Decision Making [27.74988221252854]
In this paper, we consider a slightly different approach to incorporating offline data into sequential decision-making. We find that the use of pretraining with unsupervised learning objectives can dramatically improve the performance of policy learning algorithms.
arXiv Detail & Related papers (2021-02-11T02:38:12Z)
Offline Reinforcement Learning from Images with Latent Space Models [60.69745540036375]
offline reinforcement learning (RL) refers to the problem of learning policies from a static dataset of environment interactions. We build on recent advances in model-based algorithms for offline RL, and extend them to high-dimensional visual observation spaces. Our approach is both tractable in practice and corresponds to maximizing a lower bound of the ELBO in the unknown POMDP.
arXiv Detail & Related papers (2020-12-21T18:28:17Z)
AWAC: Accelerating Online Reinforcement Learning with Offline Datasets [84.94748183816547]
We show that our method, advantage weighted actor critic (AWAC), enables rapid learning of skills with a combination of prior demonstration data and online experience. Our results show that incorporating prior data can reduce the time required to learn a range of robotic skills to practical time-scales.
arXiv Detail & Related papers (2020-06-16T17:54:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.