Related papers: Soft Actor-Critic with Inhibitory Networks for Faster Retraining

Soft Actor-Critic with Inhibitory Networks for Faster Retraining

URL: http://arxiv.org/abs/2202.02918v2
Date: Tue, 8 Feb 2022 02:38:35 GMT
Title: Soft Actor-Critic with Inhibitory Networks for Faster Retraining
Authors: Jaime S. Ide, Daria Mi\'covi\'c, Michael J. Guarino, Kevin Alcedo, David Rosenbluth, Adrian P. Pope
Abstract summary: Reusing previously trained models is critical in deep reinforcement learning. It is unclear how to acquire new skills when objectives and constraints are in conflict with previously learned skills. We propose a novel approach using inhibitory networks to allow separate and adaptive state value evaluations.
Score: 0.24466725954625884
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Reusing previously trained models is critical in deep reinforcement learning to speed up training of new agents. However, it is unclear how to acquire new skills when objectives and constraints are in conflict with previously learned skills. Moreover, when retraining, there is an intrinsic conflict between exploiting what has already been learned and exploring new skills. In soft actor-critic (SAC) methods, a temperature parameter can be dynamically adjusted to weight the action entropy and balance the explore $\times$ exploit trade-off. However, controlling a single coefficient can be challenging within the context of retraining, even more so when goals are contradictory. In this work, inspired by neuroscience research, we propose a novel approach using inhibitory networks to allow separate and adaptive state value evaluations, as well as distinct automatic entropy tuning. Ultimately, our approach allows for controlling inhibition to handle conflict between exploiting less risky, acquired behaviors and exploring novel ones to overcome more challenging tasks. We validate our method through experiments in OpenAI Gym environments.

Related papers

Capability-Oriented Training Induced Alignment Risk [101.37328448441208]
We investigate whether language models, when trained with reinforcement learning, will spontaneously learn to exploit flaws to maximize their reward.<n>Our experiments show that models consistently learn to exploit these vulnerabilities, discovering opportunistic strategies that significantly increase their reward at the expense of task correctness or safety.<n>Our findings suggest that future AI safety work must extend beyond content moderation to rigorously auditing and securing the training environments and reward mechanisms themselves.
arXiv Detail & Related papers (2026-02-12T16:13:14Z)
Temporal-Difference Variational Continual Learning [89.32940051152782]
We propose new learning objectives that integrate the regularization effects of multiple previous posterior estimations.<n>Our approach effectively mitigates Catastrophic Forgetting, outperforming strong Variational CL methods.
arXiv Detail & Related papers (2024-10-10T10:58:41Z)
SLIM: Skill Learning with Multiple Critics [8.645929825516818]
Self-supervised skill learning aims to acquire useful behaviors that leverage the underlying dynamics of the environment. Latent variable models, based on mutual information, have been successful in this task but still struggle in the context of robotic manipulation. We introduce SLIM, a multi-critic learning approach for skill discovery with a particular focus on robotic manipulation.
arXiv Detail & Related papers (2024-02-01T18:07:33Z)
Data-Driven Inverse Reinforcement Learning for Expert-Learner Zero-Sum Games [30.720112378448285]
We formulate inverse reinforcement learning as an expert-learner interaction. The optimal performance intent of an expert or target agent is unknown to a learner agent. We develop an off-policy IRL algorithm that does not require knowledge of the expert and learner agent dynamics.
arXiv Detail & Related papers (2023-01-05T10:35:08Z)
Addressing Mistake Severity in Neural Networks with Semantic Knowledge [0.0]
Most robust training techniques aim to improve model accuracy on perturbed inputs. As an alternate form of robustness, we aim to reduce the severity of mistakes made by neural networks in challenging conditions. We leverage current adversarial training methods to generate targeted adversarial attacks during the training process. Results demonstrate that our approach performs better with respect to mistake severity compared to standard and adversarially trained models.
arXiv Detail & Related papers (2022-11-21T22:01:36Z)
Learning Dynamics and Generalization in Reinforcement Learning [59.530058000689884]
We show theoretically that temporal difference learning encourages agents to fit non-smooth components of the value function early in training. We show that neural networks trained using temporal difference algorithms on dense reward tasks exhibit weaker generalization between states than randomly networks and gradient networks trained with policy methods.
arXiv Detail & Related papers (2022-06-05T08:49:16Z)
Rethinking Learning Dynamics in RL using Adversarial Networks [79.56118674435844]
We present a learning mechanism for reinforcement learning of closely related skills parameterized via a skill embedding space. The main contribution of our work is to formulate an adversarial training regime for reinforcement learning with the help of entropy-regularized policy gradient formulation.
arXiv Detail & Related papers (2022-01-27T19:51:09Z)
ROMAX: Certifiably Robust Deep Multiagent Reinforcement Learning via Convex Relaxation [32.091346776897744]
Cyber-physical attacks can challenge the robustness of multiagent reinforcement learning. We propose a minimax MARL approach to infer the worst-case policy update of other agents.
arXiv Detail & Related papers (2021-09-14T16:18:35Z)
PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning. We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z)
Self-Progressing Robust Training [146.8337017922058]
Current robust training methods such as adversarial training explicitly uses an "attack" to generate adversarial examples. We propose a new framework called SPROUT, self-progressing robust training. Our results shed new light on scalable, effective and attack-independent robust training methods.
arXiv Detail & Related papers (2020-12-22T00:45:24Z)
Bridging the Imitation Gap by Adaptive Insubordination [88.35564081175642]
We show that when the teaching agent makes decisions with access to privileged information, this information is marginalized during imitation learning. We propose 'Adaptive Insubordination' (ADVISOR) to address this gap. ADVISOR dynamically weights imitation and reward-based reinforcement learning losses during training, enabling on-the-fly switching between imitation and exploration.
arXiv Detail & Related papers (2020-07-23T17:59:57Z)
Reparameterized Variational Divergence Minimization for Stable Imitation [57.06909373038396]
We study the extent to which variations in the choice of probabilistic divergence may yield more performant ILO algorithms. We contribute a re parameterization trick for adversarial imitation learning to alleviate the challenges of the promising $f$-divergence minimization framework. Empirically, we demonstrate that our design choices allow for ILO algorithms that outperform baseline approaches and more closely match expert performance in low-dimensional continuous-control tasks.
arXiv Detail & Related papers (2020-06-18T19:04:09Z)
Online Constrained Model-based Reinforcement Learning [13.362455603441552]
Key requirement is the ability to handle continuous state and action spaces while remaining within a limited time and resource budget. We propose a model based approach that combines Gaussian Process regression and Receding Horizon Control. We test our approach on a cart pole swing-up environment and demonstrate the benefits of online learning on an autonomous racing task.
arXiv Detail & Related papers (2020-04-07T15:51:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.