Related papers: Agent-Aware Training for Agent-Agnostic Action Advising in Deep Reinforcement Learning

Agent-Aware Training for Agent-Agnostic Action Advising in Deep Reinforcement Learning

URL: http://arxiv.org/abs/2311.16807v1
Date: Tue, 28 Nov 2023 14:09:43 GMT
Title: Agent-Aware Training for Agent-Agnostic Action Advising in Deep Reinforcement Learning
Authors: Yaoquan Wei, Shunyu Liu, Jie Song, Tongya Zheng, Kaixuan Chen, Yong Wang, Mingli Song
Abstract summary: Action advising endeavors to leverage supplementary guidance from expert teachers to alleviate the issue of sampling inefficiency in Deep Reinforcement Learning (DRL) Previous agent-specific action advising methods are hindered by imperfections in the agent itself, while agent-agnostic approaches exhibit limited adaptability to the learning agent. We propose a novel framework called Agent-Aware trAining yet Agent-Agnostic Action Advising (A7) to strike a balance between the two.
Score: 37.70609910232786
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Action advising endeavors to leverage supplementary guidance from expert teachers to alleviate the issue of sampling inefficiency in Deep Reinforcement Learning (DRL). Previous agent-specific action advising methods are hindered by imperfections in the agent itself, while agent-agnostic approaches exhibit limited adaptability to the learning agent. In this study, we propose a novel framework called Agent-Aware trAining yet Agent-Agnostic Action Advising (A7) to strike a balance between the two. The underlying concept of A7 revolves around utilizing the similarity of state features as an indicator for soliciting advice. However, unlike prior methodologies, the measurement of state feature similarity is performed by neither the error-prone learning agent nor the agent-agnostic advisor. Instead, we employ a proxy model to extract state features that are both discriminative (adaptive to the agent) and generally applicable (robust to agent noise). Furthermore, we utilize behavior cloning to train a model for reusing advice and introduce an intrinsic reward for the advised samples to incentivize the utilization of expert guidance. Experiments are conducted on the GridWorld, LunarLander, and six prominent scenarios from Atari games. The results demonstrate that A7 significantly accelerates the learning process and surpasses existing methods (both agent-specific and agent-agnostic) by a substantial margin. Our code will be made publicly available.

Related papers

SkillJect: Automating Stealthy Skill-Based Prompt Injection for Coding Agents with Trace-Driven Closed-Loop Refinement [120.52289344734415]
We propose an automated framework for stealthy prompt injection tailored to agent skills.<n>The framework forms a closed loop with three agents: an Attack Agent that synthesizes injection skills under explicit stealth constraints, a Code Agent that executes tasks using the injected skills and an Evaluate Agent that logs action traces.<n>Our method consistently achieves high attack success rates under realistic settings.
arXiv Detail & Related papers (2026-02-15T16:09:48Z)
Agent-Omit: Training Efficient LLM Agents for Adaptive Thought and Observation Omission via Agentic Reinforcement Learning [15.39565540937229]
Managing agent thought and observation during multi-turn agent-environment interactions is an emerging strategy to improve efficiency.<n>We propose Agent-Omit, a unified training framework that empowers LLM agents to adaptively omit redundant thoughts and observations.<n> Experimental results show that our constructed Agent-Omit-8B could obtain performance comparable to seven frontier LLM agents.
arXiv Detail & Related papers (2026-02-04T07:26:23Z)
Learning to Seek Evidence: A Verifiable Reasoning Agent with Causal Faithfulness Analysis [10.749786847079163]
Explanations for AI models in high-stakes domains like medicine often lack verifiability, which can hinder trust.<n>We propose an interactive agent that produces explanations through an auditable sequence of actions.<n>This policy is optimized using reinforcement learning, resulting in a model that is both efficient and generalizable.
arXiv Detail & Related papers (2025-11-03T10:21:35Z)
Strategy Masking: A Method for Guardrails in Value-based Reinforcement Learning Agents [0.27309692684728604]
We study methods for constructing guardrails for AI agents that use reward functions to learn decision making. We introduce a novel approach, which we call strategy masking, to explicitly learn and then suppress undesirable AI agent behavior.
arXiv Detail & Related papers (2025-01-09T18:43:05Z)
Understanding Individual Agent Importance in Multi-Agent System via Counterfactual Reasoning [20.76991315856237]
We propose EMAI, a novel agent-level explanation approach that evaluates the individual agent's importance. Inspired by counterfactual reasoning, a larger change in reward caused by the randomized action of agent indicates its higher importance. EMAI achieves higher fidelity in explanations than baselines and provides more effective guidance in practical applications.
arXiv Detail & Related papers (2024-12-20T07:24:43Z)
From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process. We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z)
Agent-as-a-Judge: Evaluate Agents with Agents [61.33974108405561]
We introduce the Agent-as-a-Judge framework, wherein agentic systems are used to evaluate agentic systems. This is an organic extension of the LLM-as-a-Judge framework, incorporating agentic features that enable intermediate feedback for the entire task-solving process. We present DevAI, a new benchmark of 55 realistic automated AI development tasks.
arXiv Detail & Related papers (2024-10-14T17:57:02Z)
Self-Supervised Adversarial Imitation Learning [20.248498544165184]
Behavioural cloning teaches an agent how to behave via expert demonstrations. Recent approaches use self-supervision of fully-observable unlabelled snapshots of the states to decode state pairs into actions. Previous work uses goal-aware strategies to solve this issue. We address this limitation by incorporating a discriminator into the original framework.
arXiv Detail & Related papers (2023-04-21T12:12:33Z)
GANterfactual-RL: Understanding Reinforcement Learning Agents' Strategies through Visual Counterfactual Explanations [0.7874708385247353]
We propose a novel but simple method to generate counterfactual explanations for RL agents. Our method is fully model-agnostic and we demonstrate that it outperforms the only previous method in several computational metrics.
arXiv Detail & Related papers (2023-02-24T15:29:43Z)
Differential Assessment of Black-Box AI Agents [29.98710357871698]
We propose a novel approach to differentially assess black-box AI agents that have drifted from their previously known models. We leverage sparse observations of the drifted agent's current behavior and knowledge of its initial model to generate an active querying policy. Empirical evaluation shows that our approach is much more efficient than re-learning the agent model from scratch.
arXiv Detail & Related papers (2022-03-24T17:48:58Z)
Explaining Reinforcement Learning Policies through Counterfactual Trajectories [147.7246109100945]
A human developer must validate that an RL agent will perform well at test-time. Our method conveys how the agent performs under distribution shifts by showing the agent's behavior across a wider trajectory distribution. In a user study, we demonstrate that our method enables users to score better than baseline methods on one of two agent validation tasks.
arXiv Detail & Related papers (2022-01-29T00:52:37Z)
PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning. We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z)
Agent-Centric Representations for Multi-Agent Reinforcement Learning [12.577354830985012]
We investigate whether object-centric representations are also beneficial in the fully cooperative multi-agent reinforcement learning setting. Specifically, we study two ways of incorporating an agent-centric inductive bias into our RL algorithm. We evaluate these approaches on the Google Research Football environment as well as DeepMind Lab 2D.
arXiv Detail & Related papers (2021-04-19T15:43:40Z)
Scalable Multi-Agent Inverse Reinforcement Learning via Actor-Attention-Critic [54.2180984002807]
Multi-agent adversarial inverse reinforcement learning (MA-AIRL) is a recent approach that applies single-agent AIRL to multi-agent problems. We propose a multi-agent inverse RL algorithm that is more sample-efficient and scalable than previous works.
arXiv Detail & Related papers (2020-02-24T20:30:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.