Related papers: Bootstrapping Reinforcement Learning with Imitation for Vision-Based Agile Flight

Bootstrapping Reinforcement Learning with Imitation for Vision-Based Agile Flight

URL: http://arxiv.org/abs/2403.12203v2
Date: Fri, 25 Oct 2024 11:10:58 GMT
Title: Bootstrapping Reinforcement Learning with Imitation for Vision-Based Agile Flight
Authors: Jiaxu Xing, Angel Romero, Leonard Bauersfeld, Davide Scaramuzza,
Abstract summary: We propose a novel approach that combines the performance of Reinforcement Learning (RL) and the sample efficiency of Imitation Learning (IL) Our framework contains three phases: training a teacher policy using RL with privileged state information, distilling it into a student policy via IL, and adaptive fine-tuning via RL. testing in both simulated and real-world scenarios shows our approach can not only learn in scenarios where RL from scratch fails but also outperforms existing IL methods in both robustness and performance.
Score: 20.92646531472541
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Learning visuomotor policies for agile quadrotor flight presents significant difficulties, primarily from inefficient policy exploration caused by high-dimensional visual inputs and the need for precise and low-latency control. To address these challenges, we propose a novel approach that combines the performance of Reinforcement Learning (RL) and the sample efficiency of Imitation Learning (IL) in the task of vision-based autonomous drone racing. While RL provides a framework for learning high-performance controllers through trial and error, it faces challenges with sample efficiency and computational demands due to the high dimensionality of visual inputs. Conversely, IL efficiently learns from visual expert demonstrations, but it remains limited by the expert's performance and state distribution. To overcome these limitations, our policy learning framework integrates the strengths of both approaches. Our framework contains three phases: training a teacher policy using RL with privileged state information, distilling it into a student policy via IL, and adaptive fine-tuning via RL. Testing in both simulated and real-world scenarios shows our approach can not only learn in scenarios where RL from scratch fails but also outperforms existing IL methods in both robustness and performance, successfully navigating a quadrotor through a race course using only visual information.

Related papers

Weakly-supervised VLM-guided Partial Contrastive Learning for Visual Language Navigation [36.17444261325021]
Visual Language Navigation (VLN) is a fundamental task within the field of Embodied AI, focusing on the ability of agents to navigate complex environments based on natural language instructions.<n>Existing methods rely on pre-trained backbone models for visual perception, which struggle with the dynamic viewpoints in VLN scenarios.<n>We propose Weakly-supervised Partial Contrastive Learning (WPCL), a method that enhances an agent's ability to identify objects from dynamic viewpoints in VLN scenarios without requiring VLM fine-tuning.
arXiv Detail & Related papers (2025-06-18T11:43:50Z)
Learning Adaptive Dexterous Grasping from Single Demonstrations [27.806856958659054]
This work tackles two key challenges: efficient skill acquisition from limited human demonstrations and context-driven skill selection. AdaDexGrasp learns a library of grasping skills from a single human demonstration per skill and selects the most suitable one using a vision-language model (VLM) We evaluate AdaDexGrasp in both simulation and real-world settings, showing that our approach significantly improves RL efficiency and enables learning human-like grasp strategies across varied object configurations.
arXiv Detail & Related papers (2025-03-26T04:05:50Z)
Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning [47.785786984974855]
We present a human-in-the-loop vision-based RL system that demonstrates impressive performance on a diverse set of dexterous manipulation tasks. Our approach integrates demonstrations and human corrections, efficient RL algorithms, and other system-level design choices to learn policies. We show that our method significantly outperforms imitation learning baselines and prior RL approaches, with an average 2x improvement in success rate and 1.8x faster execution.
arXiv Detail & Related papers (2024-10-29T08:12:20Z)
Empowering Embodied Visual Tracking with Visual Foundation Models and Offline RL [19.757030674041037]
Embodied visual tracking is a vital and challenging skill for embodied agents. Existing methods suffer from inefficient training and poor generalization. We propose a novel framework that combines visual foundation models and offline reinforcement learning.
arXiv Detail & Related papers (2024-04-15T15:12:53Z)
Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment. We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent. We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z)
RVSL: Robust Vehicle Similarity Learning in Real Hazy Scenes Based on Semi-supervised Learning [24.13217601503959]
Vehicle similarity learning, also called re-identification (ReID), has attracted significant attention in computer vision. We construct a training paradigm called textbfRVSL which integrates ReID and domain transformation techniques. We show that the proposed method can achieve state-of-the-art performance on hazy vehicle ReID problems.
arXiv Detail & Related papers (2022-09-18T18:45:06Z)
Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy. In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks. We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z)
Reinforcement Learning with Sparse Rewards using Guidance from Offline Demonstration [9.017416068706579]
A major challenge in real-world reinforcement learning (RL) is the sparsity of reward feedback. We develop an algorithm that exploits the offline demonstration data generated by a sub-optimal behavior policy. We demonstrate the superior performance of our algorithm over state-of-the-art approaches.
arXiv Detail & Related papers (2022-02-09T18:45:40Z)
RvS: What is Essential for Offline RL via Supervised Learning? [77.91045677562802]
Recent work has shown that supervised learning alone, without temporal difference (TD) learning, can be remarkably effective for offline RL. In every environment suite we consider simply maximizing likelihood with two-layer feedforward is competitive. They also probe the limits of existing RvS methods, which are comparatively weak on random data.
arXiv Detail & Related papers (2021-12-20T18:55:16Z)
Vision-Based Autonomous Car Racing Using Deep Imitative Reinforcement Learning [13.699336307578488]
Deep imitative reinforcement learning approach (DIRL) achieves agile autonomous racing using visual inputs. We validate our algorithm both in a high-fidelity driving simulation and on a real-world 1/20-scale RC-car with limited onboard computation.
arXiv Detail & Related papers (2021-07-18T00:00:48Z)
Offline Reinforcement Learning from Images with Latent Space Models [60.69745540036375]
offline reinforcement learning (RL) refers to the problem of learning policies from a static dataset of environment interactions. We build on recent advances in model-based algorithms for offline RL, and extend them to high-dimensional visual observation spaces. Our approach is both tractable in practice and corresponds to maximizing a lower bound of the ELBO in the unknown POMDP.
arXiv Detail & Related papers (2020-12-21T18:28:17Z)
AWAC: Accelerating Online Reinforcement Learning with Offline Datasets [84.94748183816547]
We show that our method, advantage weighted actor critic (AWAC), enables rapid learning of skills with a combination of prior demonstration data and online experience. Our results show that incorporating prior data can reduce the time required to learn a range of robotic skills to practical time-scales.
arXiv Detail & Related papers (2020-06-16T17:54:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.