Learning to Run with Potential-Based Reward Shaping and Demonstrations
from Video Data
- URL: http://arxiv.org/abs/2012.08824v1
- Date: Wed, 16 Dec 2020 09:46:58 GMT
- Title: Learning to Run with Potential-Based Reward Shaping and Demonstrations
from Video Data
- Authors: Aleksandra Malysheva, Daniel Kudenko, Aleksei Shpilman
- Abstract summary: "Learning to run" competition was to train a two-legged model of a humanoid body to run in a simulated race course with maximum speed.
All submissions took a tabula rasa approach to reinforcement learning (RL) and were able to produce relatively fast, but not optimal running behaviour.
We demonstrate how data from videos of human running can be used to shape the reward of the humanoid learning agent.
- Score: 70.540936204654
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning to produce efficient movement behaviour for humanoid robots from
scratch is a hard problem, as has been illustrated by the "Learning to run"
competition at NIPS 2017. The goal of this competition was to train a
two-legged model of a humanoid body to run in a simulated race course with
maximum speed. All submissions took a tabula rasa approach to reinforcement
learning (RL) and were able to produce relatively fast, but not optimal running
behaviour. In this paper, we demonstrate how data from videos of human running
(e.g. taken from YouTube) can be used to shape the reward of the humanoid
learning agent to speed up the learning and produce a better result.
Specifically, we are using the positions of key body parts at regular time
intervals to define a potential function for potential-based reward shaping
(PBRS). Since PBRS does not change the optimal policy, this approach allows the
RL agent to overcome sub-optimalities in the human movements that are shown in
the videos.
We present experiments in which we combine selected techniques from the top
ten approaches from the NIPS competition with further optimizations to create
an high-performing agent as a baseline. We then demonstrate how video-based
reward shaping improves the performance further, resulting in an RL agent that
runs twice as fast as the baseline in 12 hours of training. We furthermore show
that our approach can overcome sub-optimal running behaviour in videos, with
the learned policy significantly outperforming that of the running agent from
the video.
Related papers
- Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning [47.785786984974855]
We present a human-in-the-loop vision-based RL system that demonstrates impressive performance on a diverse set of dexterous manipulation tasks.
Our approach integrates demonstrations and human corrections, efficient RL algorithms, and other system-level design choices to learn policies.
We show that our method significantly outperforms imitation learning baselines and prior RL approaches, with an average 2x improvement in success rate and 1.8x faster execution.
arXiv Detail & Related papers (2024-10-29T08:12:20Z) - Refining Pre-Trained Motion Models [56.18044168821188]
We take on the challenge of improving state-of-the-art supervised models with self-supervised training.
We focus on obtaining a "clean" training signal from real-world unlabelled video.
We show that our method yields reliable gains over fully-supervised methods in real videos.
arXiv Detail & Related papers (2024-01-01T18:59:33Z) - Rethinking Closed-loop Training for Autonomous Driving [82.61418945804544]
We present the first empirical study which analyzes the effects of different training benchmark designs on the success of learning agents.
We propose trajectory value learning (TRAVL), an RL-based driving agent that performs planning with multistep look-ahead.
Our experiments show that TRAVL can learn much faster and produce safer maneuvers compared to all the baselines.
arXiv Detail & Related papers (2023-06-27T17:58:39Z) - REST: REtrieve & Self-Train for generative action recognition [54.90704746573636]
We propose to adapt a pre-trained generative Vision & Language (V&L) Foundation Model for video/action recognition.
We show that direct fine-tuning of a generative model to produce action classes suffers from severe overfitting.
We introduce REST, a training framework consisting of two key components.
arXiv Detail & Related papers (2022-09-29T17:57:01Z) - RSPNet: Relative Speed Perception for Unsupervised Video Representation
Learning [100.76672109782815]
We study unsupervised video representation learning that seeks to learn both motion and appearance features from unlabeled video only.
It is difficult to construct a suitable self-supervised task to well model both motion and appearance features.
We propose a new way to perceive the playback speed and exploit the relative speed between two video clips as labels.
arXiv Detail & Related papers (2020-10-27T16:42:50Z) - Chrome Dino Run using Reinforcement Learning [0.0]
We study most popular model free reinforcement learning algorithms along with convolutional neural network to train the agent for playing the game of Chrome Dino Run.
We have used two of the popular temporal difference approaches namely Deep Q-Learning, and Expected SARSA and also implemented Double DQN model to train the agent.
arXiv Detail & Related papers (2020-08-15T22:18:20Z) - Curriculum Learning for Recurrent Video Object Segmentation [2.3376061255029064]
This work explores different schedule sampling and frame skipping variations to significantly improve the performance of a recurrent architecture.
Our results on the car class of the KITTI-MOTS challenge indicate that, surprisingly, an inverse schedule sampling is a better option than a classic forward one.
arXiv Detail & Related papers (2020-08-15T10:51:22Z) - Dynamic Experience Replay [6.062589413216726]
We build upon Ape-X DDPG and demonstrate our approach on robotic tight-fitting joint assembly tasks.
In particular, we run experiments on two different tasks: peg-in-hole and lap-joint.
Our ablation studies show that Dynamic Experience Replay is a crucial ingredient that either largely shortens the training time in these challenging environments.
arXiv Detail & Related papers (2020-03-04T23:46:45Z) - Towards Learning to Imitate from a Single Video Demonstration [11.15358253586118]
We develop a reinforcement learning agent that can learn to imitate given video observation.
We use a Siamese recurrent neural network architecture to learn rewards in space and time between motion clips.
We demonstrate our approach on simulated humanoid, dog, and raptor agents in 2D and a quadruped and a humanoid in 3D.
arXiv Detail & Related papers (2019-01-22T06:46:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.