Bootstrapping Reinforcement Learning with Imitation for Vision-Based Agile Flight
- URL: http://arxiv.org/abs/2403.12203v1
- Date: Mon, 18 Mar 2024 19:25:57 GMT
- Title: Bootstrapping Reinforcement Learning with Imitation for Vision-Based Agile Flight
- Authors: Jiaxu Xing, Angel Romero, Leonard Bauersfeld, Davide Scaramuzza,
- Abstract summary: We combine the effectiveness of Reinforcement Learning (RL) and the efficiency of Imitation Learning (IL) in the context of vision-based, autonomous drone racing.
Our framework involves three stages: initial training of a teacher policy using privileged state information, distilling this policy into a student policy using IL, and performance-constrained adaptive RL fine-tuning.
Our experiments in both simulated and real-world environments demonstrate that our approach achieves superior performance and robustness than IL or RL alone in navigating a quadrotor through a racing course using only visual information without explicit state estimation.
- Score: 20.92646531472541
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We combine the effectiveness of Reinforcement Learning (RL) and the efficiency of Imitation Learning (IL) in the context of vision-based, autonomous drone racing. We focus on directly processing visual input without explicit state estimation. While RL offers a general framework for learning complex controllers through trial and error, it faces challenges regarding sample efficiency and computational demands due to the high dimensionality of visual inputs. Conversely, IL demonstrates efficiency in learning from visual demonstrations but is limited by the quality of those demonstrations and faces issues like covariate shift. To overcome these limitations, we propose a novel training framework combining RL and IL's advantages. Our framework involves three stages: initial training of a teacher policy using privileged state information, distilling this policy into a student policy using IL, and performance-constrained adaptive RL fine-tuning. Our experiments in both simulated and real-world environments demonstrate that our approach achieves superior performance and robustness than IL or RL alone in navigating a quadrotor through a racing course using only visual information without explicit state estimation.
Related papers
- Weakly-supervised VLM-guided Partial Contrastive Learning for Visual Language Navigation [36.17444261325021]
Visual Language Navigation (VLN) is a fundamental task within the field of Embodied AI, focusing on the ability of agents to navigate complex environments based on natural language instructions.<n>Existing methods rely on pre-trained backbone models for visual perception, which struggle with the dynamic viewpoints in VLN scenarios.<n>We propose Weakly-supervised Partial Contrastive Learning (WPCL), a method that enhances an agent's ability to identify objects from dynamic viewpoints in VLN scenarios without requiring VLM fine-tuning.
arXiv Detail & Related papers (2025-06-18T11:43:50Z) - Learning Adaptive Dexterous Grasping from Single Demonstrations [27.806856958659054]
This work tackles two key challenges: efficient skill acquisition from limited human demonstrations and context-driven skill selection.
AdaDexGrasp learns a library of grasping skills from a single human demonstration per skill and selects the most suitable one using a vision-language model (VLM)
We evaluate AdaDexGrasp in both simulation and real-world settings, showing that our approach significantly improves RL efficiency and enables learning human-like grasp strategies across varied object configurations.
arXiv Detail & Related papers (2025-03-26T04:05:50Z) - Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning [47.785786984974855]
We present a human-in-the-loop vision-based RL system that demonstrates impressive performance on a diverse set of dexterous manipulation tasks.
Our approach integrates demonstrations and human corrections, efficient RL algorithms, and other system-level design choices to learn policies.
We show that our method significantly outperforms imitation learning baselines and prior RL approaches, with an average 2x improvement in success rate and 1.8x faster execution.
arXiv Detail & Related papers (2024-10-29T08:12:20Z) - Empowering Embodied Visual Tracking with Visual Foundation Models and Offline RL [19.757030674041037]
Embodied visual tracking is a vital and challenging skill for embodied agents.
Existing methods suffer from inefficient training and poor generalization.
We propose a novel framework that combines visual foundation models and offline reinforcement learning.
arXiv Detail & Related papers (2024-04-15T15:12:53Z) - Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment.
We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent.
We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z) - RVSL: Robust Vehicle Similarity Learning in Real Hazy Scenes Based on
Semi-supervised Learning [24.13217601503959]
Vehicle similarity learning, also called re-identification (ReID), has attracted significant attention in computer vision.
We construct a training paradigm called textbfRVSL which integrates ReID and domain transformation techniques.
We show that the proposed method can achieve state-of-the-art performance on hazy vehicle ReID problems.
arXiv Detail & Related papers (2022-09-18T18:45:06Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Reinforcement Learning with Sparse Rewards using Guidance from Offline
Demonstration [9.017416068706579]
A major challenge in real-world reinforcement learning (RL) is the sparsity of reward feedback.
We develop an algorithm that exploits the offline demonstration data generated by a sub-optimal behavior policy.
We demonstrate the superior performance of our algorithm over state-of-the-art approaches.
arXiv Detail & Related papers (2022-02-09T18:45:40Z) - RvS: What is Essential for Offline RL via Supervised Learning? [77.91045677562802]
Recent work has shown that supervised learning alone, without temporal difference (TD) learning, can be remarkably effective for offline RL.
In every environment suite we consider simply maximizing likelihood with two-layer feedforward is competitive.
They also probe the limits of existing RvS methods, which are comparatively weak on random data.
arXiv Detail & Related papers (2021-12-20T18:55:16Z) - Vision-Based Autonomous Car Racing Using Deep Imitative Reinforcement
Learning [13.699336307578488]
Deep imitative reinforcement learning approach (DIRL) achieves agile autonomous racing using visual inputs.
We validate our algorithm both in a high-fidelity driving simulation and on a real-world 1/20-scale RC-car with limited onboard computation.
arXiv Detail & Related papers (2021-07-18T00:00:48Z) - Offline Reinforcement Learning from Images with Latent Space Models [60.69745540036375]
offline reinforcement learning (RL) refers to the problem of learning policies from a static dataset of environment interactions.
We build on recent advances in model-based algorithms for offline RL, and extend them to high-dimensional visual observation spaces.
Our approach is both tractable in practice and corresponds to maximizing a lower bound of the ELBO in the unknown POMDP.
arXiv Detail & Related papers (2020-12-21T18:28:17Z) - AWAC: Accelerating Online Reinforcement Learning with Offline Datasets [84.94748183816547]
We show that our method, advantage weighted actor critic (AWAC), enables rapid learning of skills with a combination of prior demonstration data and online experience.
Our results show that incorporating prior data can reduce the time required to learn a range of robotic skills to practical time-scales.
arXiv Detail & Related papers (2020-06-16T17:54:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.