Related papers: Adversarial Style Transfer for Robust Policy Optimization in Deep Reinforcement Learning

Adversarial Style Transfer for Robust Policy Optimization in Deep Reinforcement Learning

URL: http://arxiv.org/abs/2308.15550v1
Date: Tue, 29 Aug 2023 18:17:35 GMT
Title: Adversarial Style Transfer for Robust Policy Optimization in Deep Reinforcement Learning
Authors: Md Masudur Rahman and Yexiang Xue
Abstract summary: This paper proposes an algorithm that aims to improve generalization for reinforcement learning agents by removing overfitting to confounding features. A policy network updates its parameters to minimize the effect of such perturbations, thus staying robust while maximizing the expected future reward. We evaluate our approach on Procgen and Distracting Control Suite for generalization and sample efficiency.
Score: 13.652106087606471
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper proposes an algorithm that aims to improve generalization for reinforcement learning agents by removing overfitting to confounding features. Our approach consists of a max-min game theoretic objective. A generator transfers the style of observation during reinforcement learning. An additional goal of the generator is to perturb the observation, which maximizes the agent's probability of taking a different action. In contrast, a policy network updates its parameters to minimize the effect of such perturbations, thus staying robust while maximizing the expected future reward. Based on this setup, we propose a practical deep reinforcement learning algorithm, Adversarial Robust Policy Optimization (ARPO), to find a robust policy that generalizes to unseen environments. We evaluate our approach on Procgen and Distracting Control Suite for generalization and sample efficiency. Empirically, ARPO shows improved performance compared to a few baseline algorithms, including data augmentation.

Related papers

RL-finetuning LLMs from on- and off-policy data with a single algorithm [53.70731390624718]
We introduce a novel reinforcement learning algorithm (AGRO) for fine-tuning large-language models. AGRO leverages the concept of generation consistency, which states that the optimal policy satisfies the notion of consistency across any possible generation of the model. We derive algorithms that find optimal solutions via the sample-based policy gradient and provide theoretical guarantees on their convergence.
arXiv Detail & Related papers (2025-03-25T12:52:38Z)
Learning Dynamic Representations via An Optimally-Weighted Maximum Mean Discrepancy Optimization Framework for Continual Learning [16.10753846850319]
Continual learning allows models to persistently acquire and retain information. catastrophic forgetting can severely impair model performance. We introduce a novel framework termed Optimally-Weighted Mean Discrepancy (OWMMD), which imposes penalties on representation alterations.
arXiv Detail & Related papers (2025-01-21T13:33:45Z)
A Model-Based Approach for Improving Reinforcement Learning Efficiency Leveraging Expert Observations [9.240917262195046]
We propose an algorithm that automatically adjusts the weights of each component in the augmented loss function. Experiments on a variety of continuous control tasks demonstrate that the proposed algorithm outperforms various benchmarks.
arXiv Detail & Related papers (2024-02-29T03:53:02Z)
Dropout Strategy in Reinforcement Learning: Limiting the Surrogate Objective Variance in Policy Optimization Methods [0.0]
Policy-based reinforcement learning algorithms are widely used in various fields. These algorithms introduce importance sampling into policy iteration. This can lead to a high variance of the surrogate objective and indirectly affects the stability and convergence of the algorithm.
arXiv Detail & Related papers (2023-10-31T11:38:26Z)
Acceleration in Policy Optimization [50.323182853069184]
We work towards a unifying paradigm for accelerating policy optimization methods in reinforcement learning (RL) by integrating foresight in the policy improvement step via optimistic and adaptive updates. We define optimism as predictive modelling of the future behavior of a policy, and adaptivity as taking immediate and anticipatory corrective actions to mitigate errors from overshooting predictions or delayed responses to change. We design an optimistic policy gradient algorithm, adaptive via meta-gradient learning, and empirically highlight several design choices pertaining to acceleration, in an illustrative task.
arXiv Detail & Related papers (2023-06-18T15:50:57Z)
Local Optimization Achieves Global Optimality in Multi-Agent Reinforcement Learning [139.53668999720605]
We present a multi-agent PPO algorithm in which the local policy of each agent is updated similarly to vanilla PPO. We prove that with standard regularity conditions on the Markov game and problem-dependent quantities, our algorithm converges to the globally optimal policy at a sublinear rate.
arXiv Detail & Related papers (2023-05-08T16:20:03Z)
Adversarial Policy Optimization in Deep Reinforcement Learning [16.999444076456268]
The policy represented by the deep neural network can overfitting, which hamper a reinforcement learning agent from learning effective policy. Data augmentation can provide a performance boost to RL agents by mitigating the effect of overfitting. We propose a novel RL algorithm to mitigate the above issue and improve the efficiency of the learned policy.
arXiv Detail & Related papers (2023-04-27T21:01:08Z)
Robust Policy Optimization in Deep Reinforcement Learning [16.999444076456268]
In continuous action domains, parameterized distribution of action distribution allows easy control of exploration. In particular, we propose an algorithm called Robust Policy Optimization (RPO), which leverages a perturbed distribution. We evaluated our methods on various continuous control tasks from DeepMind Control, OpenAI Gym, Pybullet, and IsaacGym.
arXiv Detail & Related papers (2022-12-14T22:43:56Z)
Policy Gradient Bayesian Robust Optimization for Imitation Learning [49.881386773269746]
We derive a novel policy gradient-style robust optimization approach, PG-BROIL, to balance expected performance and risk. Results suggest PG-BROIL can produce a family of behaviors ranging from risk-neutral to risk-averse.
arXiv Detail & Related papers (2021-06-11T16:49:15Z)
Generative Actor-Critic: An Off-policy Algorithm Using the Push-forward Model [24.030426634281643]
In continuous control tasks, widely used policies with Gaussian distributions results in ineffective exploration of environments. We propose a density-free off-policy algorithm, Generative Actor-Critic, using the push-forward model to increase the expressiveness of policies. We show that push-forward policies possess desirable features, such as multi-modality, which can improve the efficiency of exploration and performance of algorithms obviously.
arXiv Detail & Related papers (2021-05-08T16:29:20Z)
Iterative Amortized Policy Optimization [147.63129234446197]
Policy networks are a central feature of deep reinforcement learning (RL) algorithms for continuous control. From the variational inference perspective, policy networks are a form of textitamortized optimization, optimizing network parameters rather than the policy distributions directly. We demonstrate that iterative amortized policy optimization, yields performance improvements over direct amortization on benchmark continuous control tasks.
arXiv Detail & Related papers (2020-10-20T23:25:42Z)
Optimization-driven Deep Reinforcement Learning for Robust Beamforming in IRS-assisted Wireless Communications [54.610318402371185]
Intelligent reflecting surface (IRS) is a promising technology to assist downlink information transmissions from a multi-antenna access point (AP) to a receiver. We minimize the AP's transmit power by a joint optimization of the AP's active beamforming and the IRS's passive beamforming. We propose a deep reinforcement learning (DRL) approach that can adapt the beamforming strategies from past experiences.
arXiv Detail & Related papers (2020-05-25T01:42:55Z)
Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension. We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation. These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.