The SwaNNFlight System: On-the-Fly Sim-to-Real Adaptation via Anchored
Learning
- URL: http://arxiv.org/abs/2301.06987v1
- Date: Tue, 17 Jan 2023 16:16:53 GMT
- Title: The SwaNNFlight System: On-the-Fly Sim-to-Real Adaptation via Anchored
Learning
- Authors: Bassel El Mabsout, Shahin Roozkhosh, Siddharth Mysore, Kate Saenko,
Renato Mancuso
- Abstract summary: We train and live-adapt agents on quadrotors built from off-the-shelf hardware.
We develop SwaNNFlight, an open-source firmware enabling wireless data capture and transfer of agents' observations.
We also design SwaNNFlight System (SwaNNFS) allowing new research in training and live-adapting learning agents on similar systems.
- Score: 40.99371018933319
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reinforcement Learning (RL) agents trained in simulated environments and then
deployed in the real world are often sensitive to the differences in dynamics
presented, commonly termed the sim-to-real gap. With the goal of minimizing
this gap on resource-constrained embedded systems, we train and live-adapt
agents on quadrotors built from off-the-shelf hardware. In achieving this we
developed three novel contributions. (i) SwaNNFlight, an open-source firmware
enabling wireless data capture and transfer of agents' observations.
Fine-tuning agents with new data, and receiving and swapping onboard NN
controllers -- all while in flight. We also design SwaNNFlight System (SwaNNFS)
allowing new research in training and live-adapting learning agents on similar
systems. (ii) Multiplicative value composition, a technique for preserving the
importance of each policy optimization criterion, improving training
performance and variability in learnt behavior. And (iii) anchor critics to
help stabilize the fine-tuning of agents during sim-to-real transfer, online
learning from real data while retaining behavior optimized in simulation. We
train consistently flight-worthy control policies in simulation and deploy them
on real quadrotors. We then achieve live controller adaptation via over-the-air
updates of the onboard control policy from a ground station. Our results
indicate that live adaptation unlocks a near-50\% reduction in power
consumption, attributed to the sim-to-real gap. Finally, we tackle the issues
of catastrophic forgetting and controller instability, showing the
effectiveness of our novel methods.
Project Website: https://github.com/BU-Cyber-Physical-Systems-Lab/SwaNNFS
Related papers
- Autonomous Vehicle Controllers From End-to-End Differentiable Simulation [60.05963742334746]
We propose a differentiable simulator and design an analytic policy gradients (APG) approach to training AV controllers.
Our proposed framework brings the differentiable simulator into an end-to-end training loop, where gradients of environment dynamics serve as a useful prior to help the agent learn a more grounded policy.
We find significant improvements in performance and robustness to noise in the dynamics, as well as overall more intuitive human-like handling.
arXiv Detail & Related papers (2024-09-12T11:50:06Z) - The Power of Resets in Online Reinforcement Learning [73.64852266145387]
We explore the power of simulators through online reinforcement learning with local simulator access (or, local planning)
We show that MDPs with low coverability can be learned in a sample-efficient fashion with only $Qstar$-realizability.
We show that the notorious Exogenous Block MDP problem is tractable under local simulator access.
arXiv Detail & Related papers (2024-04-23T18:09:53Z) - Belief-Enriched Pessimistic Q-Learning against Adversarial State
Perturbations [5.076419064097735]
Recent work shows that a well-trained RL agent can be easily manipulated by strategically perturbing its state observations at the test stage.
Existing solutions either introduce a regularization term to improve the smoothness of the trained policy against perturbations or alternatively train the agent's policy and the attacker's policy.
We propose a new robust RL algorithm for deriving a pessimistic policy to safeguard against an agent's uncertainty about true states.
arXiv Detail & Related papers (2024-03-06T20:52:49Z) - SAFE-SIM: Safety-Critical Closed-Loop Traffic Simulation with Diffusion-Controllable Adversaries [94.84458417662407]
We introduce SAFE-SIM, a controllable closed-loop safety-critical simulation framework.
Our approach yields two distinct advantages: 1) generating realistic long-tail safety-critical scenarios that closely reflect real-world conditions, and 2) providing controllable adversarial behavior for more comprehensive and interactive evaluations.
We validate our framework empirically using the nuScenes and nuPlan datasets across multiple planners, demonstrating improvements in both realism and controllability.
arXiv Detail & Related papers (2023-12-31T04:14:43Z) - Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - FIRE: A Failure-Adaptive Reinforcement Learning Framework for Edge Computing Migrations [52.85536740465277]
FIRE is a framework that adapts to rare events by training a RL policy in an edge computing digital twin environment.
We propose ImRE, an importance sampling-based Q-learning algorithm, which samples rare events proportionally to their impact on the value function.
We show that FIRE reduces costs compared to vanilla RL and the greedy baseline in the event of failures.
arXiv Detail & Related papers (2022-09-28T19:49:39Z) - Cloud-Edge Training Architecture for Sim-to-Real Deep Reinforcement
Learning [0.8399688944263843]
Deep reinforcement learning (DRL) is a promising approach to solve complex control tasks by learning policies through interactions with the environment.
Sim-to-real approaches leverage simulations to pretrain DRL policies and then deploy them in the real world.
This work proposes a distributed cloud-edge architecture to train DRL agents in the real world in real-time.
arXiv Detail & Related papers (2022-03-04T10:27:01Z) - Finding Failures in High-Fidelity Simulation using Adaptive Stress
Testing and the Backward Algorithm [35.076062292062325]
Adaptive stress testing (AST) is a method that uses reinforcement learning to find the most likely failure of a system.
AST with a deep reinforcement learning solver has been shown to be effective in finding failures across a range of different systems.
To improve efficiency, we present a method that first finds failures in a low-fidelity simulator.
It then uses the backward algorithm, which trains a deep neural network policy using a single expert demonstration, to adapt the low-fidelity failures to high-fidelity.
arXiv Detail & Related papers (2021-07-27T16:54:04Z) - Multiplicative Controller Fusion: Leveraging Algorithmic Priors for
Sample-efficient Reinforcement Learning and Safe Sim-To-Real Transfer [18.50206483493784]
We present a novel approach to model-free reinforcement learning that can leverage existing sub-optimal solutions.
During training, our gated fusion approach enables the prior to guide the initial stages of exploration.
We show the efficacy of our Multiplicative Controller Fusion approach on the task of robot navigation.
arXiv Detail & Related papers (2020-03-11T05:12:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.