Related papers: Performance-Weighed Policy Sampling for Meta-Reinforcement Learning

Performance-Weighed Policy Sampling for Meta-Reinforcement Learning

URL: http://arxiv.org/abs/2012.06016v1
Date: Thu, 10 Dec 2020 23:08:38 GMT
Title: Performance-Weighed Policy Sampling for Meta-Reinforcement Learning
Authors: Ibrahim Ahmed, Marcos Quinones-Grueiro, Gautam Biswas
Abstract summary: Enhanced Model-Agnostic Meta-Learning (E-MAML) generates fast convergence of the policy function from a small number of training examples. E-MAML maintains a set of policy parameters learned in the environment for previous tasks. We apply E-MAML to developing reinforcement learning (RL)-based online fault tolerant control schemes.
Score: 1.77898701462905
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper discusses an Enhanced Model-Agnostic Meta-Learning (E-MAML) algorithm that generates fast convergence of the policy function from a small number of training examples when applied to new learning tasks. Built on top of Model-Agnostic Meta-Learning (MAML), E-MAML maintains a set of policy parameters learned in the environment for previous tasks. We apply E-MAML to developing reinforcement learning (RL)-based online fault tolerant control schemes for dynamic systems. The enhancement is applied when a new fault occurs, to re-initialize the parameters of a new RL policy that achieves faster adaption with a small number of samples of system behavior with the new fault. This replaces the random task sampling step in MAML. Instead, it exploits the extant previously generated experiences of the controller. The enhancement is sampled to maximally span the parameter space to facilitate adaption to the new fault. We demonstrate the performance of our approach combining E-MAML with proximal policy optimization (PPO) on the well-known cart pole example, and then on the fuel transfer system of an aircraft.

Related papers

One-Class Domain Adaptation via Meta-Learning [0.5937476291232802]
The deployment of IoT (Internet of Things) sensor-based machine learning models in industrial systems for anomaly classification tasks poses significant challenges. It is therefore crucial to develop adaptable machine learning models that can be effectively transferred from one environment to another. We proposed a task sampling strategy to adapt any bi-level meta-learning algorithm to OC-DA. The OC-DA MAML algorithm is evaluated on the Rainbow-MNIST meta-learning benchmark and on a real-world dataset of vibration-based sensor readings.
arXiv Detail & Related papers (2025-01-22T18:01:24Z)
A Moreau Envelope Approach for LQR Meta-Policy Estimation [0.7311194870168775]
We study the problem of policy estimation for the Linear Quadratic Regulator (LQR) in discrete-time linear time-invariant uncertain dynamical systems. We propose a surrogate LQR cost, built from a finite set of realizations of the uncertain system, to define a meta-policy efficiently adjustable to new realizations.
arXiv Detail & Related papers (2024-03-26T04:02:09Z)
Data-Efficient Task Generalization via Probabilistic Model-based Meta Reinforcement Learning [58.575939354953526]
PACOH-RL is a novel model-based Meta-Reinforcement Learning (Meta-RL) algorithm designed to efficiently adapt control policies to changing dynamics. Existing Meta-RL methods require abundant meta-learning data, limiting their applicability in settings such as robotics. Our experiment results demonstrate that PACOH-RL outperforms model-based RL and model-based Meta-RL baselines in adapting to new dynamic conditions.
arXiv Detail & Related papers (2023-11-13T18:51:57Z)
Let's reward step by step: Step-Level reward model as the Navigators for Reasoning [64.27898739929734]
Process-Supervised Reward Model (PRM) furnishes LLMs with step-by-step feedback during the training phase. We propose a greedy search algorithm that employs the step-level feedback from PRM to optimize the reasoning pathways explored by LLMs. To explore the versatility of our approach, we develop a novel method to automatically generate step-level reward dataset for coding tasks and observed similar improved performance in the code generation tasks.
arXiv Detail & Related papers (2023-10-16T05:21:50Z)
Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces. We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories. We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z)
Provable Generalization of Overparameterized Meta-learning Trained with SGD [62.892930625034374]
We study the generalization of a widely used meta-learning approach, Model-Agnostic Meta-Learning (MAML) We provide both upper and lower bounds for the excess risk of MAML, which captures how SGD dynamics affect these generalization bounds. Our theoretical findings are further validated by experiments.
arXiv Detail & Related papers (2022-06-18T07:22:57Z)
Energy-Efficient and Federated Meta-Learning via Projected Stochastic Gradient Ascent [79.58680275615752]
We propose an energy-efficient federated meta-learning framework. We assume each task is owned by a separate agent, so a limited number of tasks is used to train a meta-model.
arXiv Detail & Related papers (2021-05-31T08:15:44Z)
Robust MAML: Prioritization task buffer with adaptive learning process for model-agnostic meta-learning [15.894925018423665]
Model agnostic meta-learning (MAML) is a popular state-of-the-art meta-learning algorithm. This paper proposes a more robust MAML based on an adaptive learning scheme and a prioritization task buffer. Experimental results on meta reinforcement learning environments demonstrate a substantial performance gain.
arXiv Detail & Related papers (2021-03-15T09:34:34Z)
B-SMALL: A Bayesian Neural Network approach to Sparse Model-Agnostic Meta-Learning [2.9189409618561966]
We propose a Bayesian neural network based MAML algorithm, which we refer to as the B-SMALL algorithm. We demonstrate the performance of B-MAML using classification and regression tasks, and highlight that training a sparsifying BNN using MAML indeed improves the parameter footprint of the model.
arXiv Detail & Related papers (2021-01-01T09:19:48Z)
Meta-Learning with Adaptive Hyperparameters [55.182841228303225]
We focus on a complementary factor in MAML framework, inner-loop optimization (or fast adaptation) We propose a new weight update rule that greatly enhances the fast adaptation process.
arXiv Detail & Related papers (2020-10-31T08:05:34Z)
La-MAML: Look-ahead Meta Learning for Continual Learning [14.405620521842621]
We propose Look-ahead MAML (La-MAML), a fast optimisation-based meta-learning algorithm for online-continual learning, aided by a small episodic memory. La-MAML achieves performance superior to other replay-based, prior-based and meta-learning based approaches for continual learning on real-world visual classification benchmarks.
arXiv Detail & Related papers (2020-07-27T23:07:01Z)
On the Convergence Theory of Debiased Model-Agnostic Meta-Reinforcement Learning [25.163423936635787]
We consider Model-Agnostic Meta-Learning (MAML) methods for Reinforcement Learning (RL) problems. We propose a variant of the MAML method, named Gradient Meta-Reinforcement Learning (SG-MRL) We derive the iteration and sample complexity of SG-MRL to find an $ilon$-first-order stationary point, which, to the best of our knowledge, provides the first convergence guarantee for model-agnostic meta-reinforcement learning algorithms.
arXiv Detail & Related papers (2020-02-12T18:29:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.