Agent-Aware Training for Agent-Agnostic Action Advising in Deep
Reinforcement Learning
- URL: http://arxiv.org/abs/2311.16807v1
- Date: Tue, 28 Nov 2023 14:09:43 GMT
- Title: Agent-Aware Training for Agent-Agnostic Action Advising in Deep
Reinforcement Learning
- Authors: Yaoquan Wei, Shunyu Liu, Jie Song, Tongya Zheng, Kaixuan Chen, Yong
Wang, Mingli Song
- Abstract summary: Action advising endeavors to leverage supplementary guidance from expert teachers to alleviate the issue of sampling inefficiency in Deep Reinforcement Learning (DRL)
Previous agent-specific action advising methods are hindered by imperfections in the agent itself, while agent-agnostic approaches exhibit limited adaptability to the learning agent.
We propose a novel framework called Agent-Aware trAining yet Agent-Agnostic Action Advising (A7) to strike a balance between the two.
- Score: 37.70609910232786
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Action advising endeavors to leverage supplementary guidance from expert
teachers to alleviate the issue of sampling inefficiency in Deep Reinforcement
Learning (DRL). Previous agent-specific action advising methods are hindered by
imperfections in the agent itself, while agent-agnostic approaches exhibit
limited adaptability to the learning agent. In this study, we propose a novel
framework called Agent-Aware trAining yet Agent-Agnostic Action Advising (A7)
to strike a balance between the two. The underlying concept of A7 revolves
around utilizing the similarity of state features as an indicator for
soliciting advice. However, unlike prior methodologies, the measurement of
state feature similarity is performed by neither the error-prone learning agent
nor the agent-agnostic advisor. Instead, we employ a proxy model to extract
state features that are both discriminative (adaptive to the agent) and
generally applicable (robust to agent noise). Furthermore, we utilize behavior
cloning to train a model for reusing advice and introduce an intrinsic reward
for the advised samples to incentivize the utilization of expert guidance.
Experiments are conducted on the GridWorld, LunarLander, and six prominent
scenarios from Atari games. The results demonstrate that A7 significantly
accelerates the learning process and surpasses existing methods (both
agent-specific and agent-agnostic) by a substantial margin. Our code will be
made publicly available.
Related papers
- From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.
We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - Agent-as-a-Judge: Evaluate Agents with Agents [61.33974108405561]
We introduce the Agent-as-a-Judge framework, wherein agentic systems are used to evaluate agentic systems.
This is an organic extension of the LLM-as-a-Judge framework, incorporating agentic features that enable intermediate feedback for the entire task-solving process.
We present DevAI, a new benchmark of 55 realistic automated AI development tasks.
arXiv Detail & Related papers (2024-10-14T17:57:02Z) - Self-Supervised Adversarial Imitation Learning [20.248498544165184]
Behavioural cloning teaches an agent how to behave via expert demonstrations.
Recent approaches use self-supervision of fully-observable unlabelled snapshots of the states to decode state pairs into actions.
Previous work uses goal-aware strategies to solve this issue.
We address this limitation by incorporating a discriminator into the original framework.
arXiv Detail & Related papers (2023-04-21T12:12:33Z) - GANterfactual-RL: Understanding Reinforcement Learning Agents'
Strategies through Visual Counterfactual Explanations [0.7874708385247353]
We propose a novel but simple method to generate counterfactual explanations for RL agents.
Our method is fully model-agnostic and we demonstrate that it outperforms the only previous method in several computational metrics.
arXiv Detail & Related papers (2023-02-24T15:29:43Z) - Differential Assessment of Black-Box AI Agents [29.98710357871698]
We propose a novel approach to differentially assess black-box AI agents that have drifted from their previously known models.
We leverage sparse observations of the drifted agent's current behavior and knowledge of its initial model to generate an active querying policy.
Empirical evaluation shows that our approach is much more efficient than re-learning the agent model from scratch.
arXiv Detail & Related papers (2022-03-24T17:48:58Z) - Explaining Reinforcement Learning Policies through Counterfactual
Trajectories [147.7246109100945]
A human developer must validate that an RL agent will perform well at test-time.
Our method conveys how the agent performs under distribution shifts by showing the agent's behavior across a wider trajectory distribution.
In a user study, we demonstrate that our method enables users to score better than baseline methods on one of two agent validation tasks.
arXiv Detail & Related papers (2022-01-29T00:52:37Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Agent-Centric Representations for Multi-Agent Reinforcement Learning [12.577354830985012]
We investigate whether object-centric representations are also beneficial in the fully cooperative multi-agent reinforcement learning setting.
Specifically, we study two ways of incorporating an agent-centric inductive bias into our RL algorithm.
We evaluate these approaches on the Google Research Football environment as well as DeepMind Lab 2D.
arXiv Detail & Related papers (2021-04-19T15:43:40Z) - Scalable Multi-Agent Inverse Reinforcement Learning via
Actor-Attention-Critic [54.2180984002807]
Multi-agent adversarial inverse reinforcement learning (MA-AIRL) is a recent approach that applies single-agent AIRL to multi-agent problems.
We propose a multi-agent inverse RL algorithm that is more sample-efficient and scalable than previous works.
arXiv Detail & Related papers (2020-02-24T20:30:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.