Related papers: Modular Transfer Learning with Transition Mismatch Compensation for Excessive Disturbance Rejection

Modular Transfer Learning with Transition Mismatch Compensation for Excessive Disturbance Rejection

URL: http://arxiv.org/abs/2007.14646v1
Date: Wed, 29 Jul 2020 07:44:38 GMT
Title: Modular Transfer Learning with Transition Mismatch Compensation for Excessive Disturbance Rejection
Authors: Tianming Wang, Wenjie Lu, Huan Yu, Dikai Liu
Abstract summary: We propose a transfer learning framework that adapts a control policy for excessive disturbance rejection of an underwater robot. A modular network of learning policies is applied, composed of a Generalized Control Policy (GCP) and an Online Disturbance Identification Model (ODI)
Score: 29.01654847752415
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Underwater robots in shallow waters usually suffer from strong wave forces, which may frequently exceed robot's control constraints. Learning-based controllers are suitable for disturbance rejection control, but the excessive disturbances heavily affect the state transition in Markov Decision Process (MDP) or Partially Observable Markov Decision Process (POMDP). Also, pure learning procedures on targeted system may encounter damaging exploratory actions or unpredictable system variations, and training exclusively on a prior model usually cannot address model mismatch from the targeted system. In this paper, we propose a transfer learning framework that adapts a control policy for excessive disturbance rejection of an underwater robot under dynamics model mismatch. A modular network of learning policies is applied, composed of a Generalized Control Policy (GCP) and an Online Disturbance Identification Model (ODI). GCP is first trained over a wide array of disturbance waveforms. ODI then learns to use past states and actions of the system to predict the disturbance waveforms which are provided as input to GCP (along with the system state). A transfer reinforcement learning algorithm using Transition Mismatch Compensation (TMC) is developed based on the modular architecture, that learns an additional compensatory policy through minimizing mismatch of transitions predicted by the two dynamics models of the source and target tasks. We demonstrated on a pose regulation task in simulation that TMC is able to successfully reject the disturbances and stabilize the robot under an empirical model of the robot system, meanwhile improve sample efficiency.

Related papers

Model-based controller assisted domain randomization in deep reinforcement learning: application to nonlinear powertrain control [0.0]
This study proposes a new robust control approach using the framework of deep reinforcement learning (DRL) The problem setup is modeled via the latent Markov decision process (LMDP), a set of vanilla MDPs, for a controlled system subject to uncertainties and nonlinearities. Compared to traditional DRL-based controls, the proposed controller design is smarter in that we can achieve a high level of generalization ability.
arXiv Detail & Related papers (2025-04-28T12:09:07Z)
Action Flow Matching for Continual Robot Learning [57.698553219660376]
Continual learning in robotics seeks systems that can constantly adapt to changing environments and tasks. We introduce a generative framework leveraging flow matching for online robot dynamics model alignment. We find that by transforming the actions themselves rather than exploring with a misaligned model, the robot collects informative data more efficiently.
arXiv Detail & Related papers (2025-04-25T16:26:15Z)
Self-Healing Machine Learning: A Framework for Autonomous Adaptation in Real-World Environments [50.310636905746975]
Real-world machine learning systems often encounter model performance degradation due to distributional shifts in the underlying data generating process. Existing approaches to addressing shifts, such as concept drift adaptation, are limited by their reason-agnostic nature. We propose self-healing machine learning (SHML) to overcome these limitations.
arXiv Detail & Related papers (2024-10-31T20:05:51Z)
Dropout MPC: An Ensemble Neural MPC Approach for Systems with Learned Dynamics [0.0]
We propose a novel sampling-based ensemble neural MPC algorithm that employs the Monte-Carlo dropout technique on the learned system model. The method aims in general at uncertain systems with complex dynamics, where models derived from first principles are hard to infer.
arXiv Detail & Related papers (2024-06-04T17:15:25Z)
Dynamic Online Modulation Recognition using Incremental Learning [6.6953472972255]
Conventional deep learning (DL) models often fall short in online dynamic contexts. We show that modulation recognition frameworks based on Incremental Learning (IL) effectively prevent catastrophic forgetting. Our results demonstrate that modulation recognition frameworks based on IL effectively prevent catastrophic forgetting, enabling models to perform robustly in dynamic scenarios.
arXiv Detail & Related papers (2023-12-07T21:56:26Z)
DiAReL: Reinforcement Learning with Disturbance Awareness for Robust Sim2Real Policy Transfer in Robot Control [0.0]
Delayed Markov decision processes fulfill the Markov property by augmenting the state space of agents with a finite time window of recently committed actions. We introduce a disturbance-augmented Markov decision process in delayed settings as a novel representation to incorporate disturbance estimation in training on-policy reinforcement learning algorithms.
arXiv Detail & Related papers (2023-06-15T10:11:38Z)
Predictable MDP Abstraction for Unsupervised Model-Based RL [93.91375268580806]
We propose predictable MDP abstraction (PMA) Instead of training a predictive model on the original MDP, we train a model on a transformed MDP with a learned action space. We theoretically analyze PMA and empirically demonstrate that PMA leads to significant improvements over prior unsupervised model-based RL approaches.
arXiv Detail & Related papers (2023-02-08T07:37:51Z)
Learning Robust Policy against Disturbance in Transition Dynamics via State-Conservative Policy Optimization [63.75188254377202]
Deep reinforcement learning algorithms can perform poorly in real-world tasks due to discrepancy between source and target environments. We propose a novel model-free actor-critic algorithm to learn robust policies without modeling the disturbance in advance. Experiments in several robot control tasks demonstrate that SCPO learns robust policies against the disturbance in transition dynamics.
arXiv Detail & Related papers (2021-12-20T13:13:05Z)
DySMHO: Data-Driven Discovery of Governing Equations for Dynamical Systems via Moving Horizon Optimization [77.34726150561087]
We introduce Discovery of Dynamical Systems via Moving Horizon Optimization (DySMHO), a scalable machine learning framework. DySMHO sequentially learns the underlying governing equations from a large dictionary of basis functions. Canonical nonlinear dynamical system examples are used to demonstrate that DySMHO can accurately recover the governing laws.
arXiv Detail & Related papers (2021-07-30T20:35:03Z)
Collision-Free Flocking with a Dynamic Squad of Fixed-Wing UAVs Using Deep Reinforcement Learning [2.555094847583209]
We deal with the decentralized leader-follower flocking control problem through deep reinforcement learning (DRL) We propose a novel reinforcement learning algorithm CACER-II for training a shared control policy for all the followers. As a result, the variable-length system state can be encoded into a fixed-length embedding vector, which makes the learned DRL policies independent with the number or the order of followers.
arXiv Detail & Related papers (2021-01-20T11:23:35Z)
Enhanced Adversarial Strategically-Timed Attacks against Deep Reinforcement Learning [91.13113161754022]
We introduce timing-based adversarial strategies against a DRL-based navigation system by jamming in physical noise patterns on the selected time frames. Our experimental results show that the adversarial timing attacks can lead to a significant performance drop.
arXiv Detail & Related papers (2020-02-20T21:39:25Z)
Information Theoretic Model Predictive Q-Learning [64.74041985237105]
We present a novel theoretical connection between information theoretic MPC and entropy regularized RL. We develop a Q-learning algorithm that can leverage biased models.
arXiv Detail & Related papers (2019-12-31T00:29:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.