Related papers: Robotic Offline RL from Internet Videos via Value-Function Pre-Training

Robotic Offline RL from Internet Videos via Value-Function Pre-Training

URL: http://arxiv.org/abs/2309.13041v1
Date: Fri, 22 Sep 2023 17:59:14 GMT
Title: Robotic Offline RL from Internet Videos via Value-Function Pre-Training
Authors: Chethan Bhateja, Derek Guo, Dibya Ghosh, Anikait Singh, Manan Tomar, Quan Vuong, Yevgen Chebotar, Sergey Levine, Aviral Kumar
Abstract summary: We develop a system for leveraging large-scale human video datasets in robotic offline RL. We show that value learning on video datasets learns representations more conducive to downstream robotic offline RL than other approaches.
Score: 67.44673316943475
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Pre-training on Internet data has proven to be a key ingredient for broad generalization in many modern ML systems. What would it take to enable such capabilities in robotic reinforcement learning (RL)? Offline RL methods, which learn from datasets of robot experience, offer one way to leverage prior data into the robotic learning pipeline. However, these methods have a "type mismatch" with video data (such as Ego4D), the largest prior datasets available for robotics, since video offers observation-only experience without the action or reward annotations needed for RL methods. In this paper, we develop a system for leveraging large-scale human video datasets in robotic offline RL, based entirely on learning value functions via temporal-difference learning. We show that value learning on video datasets learns representations that are more conducive to downstream robotic offline RL than other approaches for learning from video data. Our system, called V-PTR, combines the benefits of pre-training on video data with robotic offline RL approaches that train on diverse robot data, resulting in value functions and policies for manipulation tasks that perform better, act robustly, and generalize broadly. On several manipulation tasks on a real WidowX robot, our framework produces policies that greatly improve over prior methods. Our video and additional details can be found at https://dibyaghosh.com/vptr/

Related papers

ViVa: Video-Trained Value Functions for Guiding Online RL from Diverse Data [56.217490064597506]
We propose and analyze a data-driven methodology that automatically guides RL by learning from widely available video data. We use intent-conditioned value functions to learn from diverse videos and incorporate these goal-conditioned values into the reward. Our experiments show that video-trained value functions work well with a variety of data sources, exhibit positive transfer from human video pre-training, can generalize to unseen goals, and scale with dataset size.
arXiv Detail & Related papers (2025-03-23T21:24:33Z)
Data-Efficient Learning from Human Interventions for Mobile Robots [46.65860995185883]
Mobile robots are essential in applications such as autonomous delivery and hospitality services. Applying learning-based methods to address mobile robot tasks has gained popularity due to its robustness and generalizability. Traditional methods such as Imitation Learning (IL) and Reinforcement Learning (RL) offer adaptability but require large datasets, carefully crafted reward functions, and face sim-to-real gaps. We propose an online human-in-the-loop learning method PVP4Real that combines IL and RL to address these issues.
arXiv Detail & Related papers (2025-03-06T21:02:02Z)
Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for Autonomous Real-World Reinforcement Learning [58.3994826169858]
We introduce RoboFuME, a reset-free fine-tuning system for robotic reinforcement learning. Our insights are to utilize offline reinforcement learning techniques to ensure efficient online fine-tuning of a pre-trained policy. Our method can incorporate data from an existing robot dataset and improve on a target task within as little as 3 hours of autonomous real-world experience.
arXiv Detail & Related papers (2023-10-23T17:50:08Z)
Scaling Robot Learning with Semantically Imagined Experience [21.361979238427722]
Recent advances in robot learning have shown promise in enabling robots to perform manipulation tasks. One of the key contributing factors to this progress is the scale of robot data used to train the models. We propose an alternative route and leverage text-to-image foundation models widely used in computer vision and natural language processing.
arXiv Detail & Related papers (2023-02-22T18:47:51Z)
Real World Offline Reinforcement Learning with Realistic Data Source [33.7474988142367]
offline reinforcement learning (ORL) holds great promise for robot learning due to its ability to learn from arbitrary pre-generated experience. Current ORL benchmarks are almost entirely in simulation and utilize contrived datasets like replay buffers of online RL agents or sub-optimal trajectories. In this work (Real-ORL), we posit that data collected from safe operations of closely related tasks are more practical data sources for real-world robot learning.
arXiv Detail & Related papers (2022-10-12T17:57:05Z)
Pre-Training for Robots: Offline RL Enables Learning New Tasks from a Handful of Trials [97.95400776235736]
We present a framework based on offline RL that attempts to effectively learn new tasks. It combines pre-training on existing robotic datasets with rapid fine-tuning on a new task, with as few as 10 demonstrations. To our knowledge, PTR is the first RL method that succeeds at learning new tasks in a new domain on a real WidowX robot.
arXiv Detail & Related papers (2022-10-11T06:30:53Z)
Accelerating Robotic Reinforcement Learning via Parameterized Action Primitives [92.0321404272942]
Reinforcement learning can be used to build general-purpose robotic systems. However, training RL agents to solve robotics tasks still remains challenging. In this work, we manually specify a library of robot action primitives (RAPS), parameterized with arguments that are learned by an RL policy. We find that our simple change to the action interface substantially improves both the learning efficiency and task performance.
arXiv Detail & Related papers (2021-10-28T17:59:30Z)
A Framework for Efficient Robotic Manipulation [79.10407063260473]
We show that a single robotic arm can learn sparse-reward manipulation policies from pixels. We show that, given only 10 demonstrations, a single robotic arm can learn sparse-reward manipulation policies from pixels.
arXiv Detail & Related papers (2020-12-14T22:18:39Z)
AWAC: Accelerating Online Reinforcement Learning with Offline Datasets [84.94748183816547]
We show that our method, advantage weighted actor critic (AWAC), enables rapid learning of skills with a combination of prior demonstration data and online experience. Our results show that incorporating prior data can reduce the time required to learn a range of robotic skills to practical time-scales.
arXiv Detail & Related papers (2020-06-16T17:54:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.