One-shot Policy Elicitation via Semantic Reward Manipulation
- URL: http://arxiv.org/abs/2101.01860v1
- Date: Wed, 6 Jan 2021 04:11:22 GMT
- Title: One-shot Policy Elicitation via Semantic Reward Manipulation
- Authors: Aaquib Tabrez, Ryan Leonard, Bradley Hayes
- Abstract summary: We present Single-shot Policy Explanation for Augmenting Rewards (SPEAR), a novel sequential optimization algorithm.
We show that SPEAR makes substantial improvements over the state-of-the-art in terms of runtime and addressable problem size.
- Score: 2.668480521943575
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Synchronizing expectations and knowledge about the state of the world is an
essential capability for effective collaboration. For robots to effectively
collaborate with humans and other autonomous agents, it is critical that they
be able to generate intelligible explanations to reconcile differences between
their understanding of the world and that of their collaborators. In this work
we present Single-shot Policy Explanation for Augmenting Rewards (SPEAR), a
novel sequential optimization algorithm that uses semantic explanations derived
from combinations of planning predicates to augment agents' reward functions,
driving their policies to exhibit more optimal behavior. We provide an
experimental validation of our algorithm's policy manipulation capabilities in
two practically grounded applications and conclude with a performance analysis
of SPEAR on domains of increasingly complex state space and predicate counts.
We demonstrate that our method makes substantial improvements over the
state-of-the-art in terms of runtime and addressable problem size, enabling an
agent to leverage its own expertise to communicate actionable information to
improve another's performance.
Related papers
- From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.
We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - Communication Learning in Multi-Agent Systems from Graph Modeling Perspective [62.13508281188895]
We introduce a novel approach wherein we conceptualize the communication architecture among agents as a learnable graph.
We introduce a temporal gating mechanism for each agent, enabling dynamic decisions on whether to receive shared information at a given time.
arXiv Detail & Related papers (2024-11-01T05:56:51Z) - Reinforcing Language Agents via Policy Optimization with Action Decomposition [36.984163245259936]
This paper proposes decomposing language agent optimization from the action level to the token level.
We then derive the Bellman backup with Action Decomposition (BAD) to integrate credit assignments for both intra-action and inter-action tokens.
Implementing BAD within the PPO algorithm, we introduce Policy Optimization with Action Decomposition (POAD)
arXiv Detail & Related papers (2024-05-23T14:01:44Z) - Learning Multi-Agent Communication from Graph Modeling Perspective [62.13508281188895]
We introduce a novel approach wherein we conceptualize the communication architecture among agents as a learnable graph.
Our proposed approach, CommFormer, efficiently optimize the communication graph and concurrently refines architectural parameters through gradient descent in an end-to-end manner.
arXiv Detail & Related papers (2024-05-14T12:40:25Z) - Decentralized and Lifelong-Adaptive Multi-Agent Collaborative Learning [57.652899266553035]
Decentralized and lifelong-adaptive multi-agent collaborative learning aims to enhance collaboration among multiple agents without a central server.
We propose DeLAMA, a decentralized multi-agent lifelong collaborative learning algorithm with dynamic collaboration graphs.
arXiv Detail & Related papers (2024-03-11T09:21:11Z) - Iterated Reasoning with Mutual Information in Cooperative and Byzantine
Decentralized Teaming [0.0]
We show that reformulating an agent's policy to be conditional on the policies of its teammates inherently maximizes Mutual Information (MI) lower-bound when optimizing under Policy Gradient (PG)
Our approach, InfoPG, outperforms baselines in learning emergent collaborative behaviors and sets the state-of-the-art in decentralized cooperative MARL tasks.
arXiv Detail & Related papers (2022-01-20T22:54:32Z) - Distributed Adaptive Learning Under Communication Constraints [54.22472738551687]
This work examines adaptive distributed learning strategies designed to operate under communication constraints.
We consider a network of agents that must solve an online optimization problem from continual observation of streaming data.
arXiv Detail & Related papers (2021-12-03T19:23:48Z) - Decentralized Multi-Agent Reinforcement Learning: An Off-Policy Method [6.261762915564555]
We discuss the problem of decentralized multi-agent reinforcement learning (MARL) in this work.
In our setting, the global state, action, and reward are assumed to be fully observable, while the local policy is protected as privacy by each agent, and thus cannot be shared with others.
The policy evaluation and policy improvement algorithms are designed for discrete and continuous state-action-space Markov Decision Process (MDP) respectively.
arXiv Detail & Related papers (2021-10-31T09:08:46Z) - APS: Active Pretraining with Successor Features [96.24533716878055]
We show that by reinterpreting and combining successorcitepHansenFast with non entropy, the intractable mutual information can be efficiently optimized.
The proposed method Active Pretraining with Successor Feature (APS) explores the environment via non entropy, and the explored data can be efficiently leveraged to learn behavior.
arXiv Detail & Related papers (2021-08-31T16:30:35Z) - Domain-Robust Visual Imitation Learning with Mutual Information
Constraints [0.0]
We introduce a new algorithm called Disentangling Generative Adversarial Imitation Learning (DisentanGAIL)
Our algorithm enables autonomous agents to learn directly from high dimensional observations of an expert performing a task.
arXiv Detail & Related papers (2021-03-08T21:18:58Z) - "I Don't Think So": Disagreement-Based Policy Summaries for Comparing
Agents [2.6270468656705765]
We propose a novel method for generating contrastive summaries that highlight the differences between agent's policies.
Our results show that the novel disagreement-based summaries lead to improved user performance compared to summaries generated using HIGHLIGHTS.
arXiv Detail & Related papers (2021-02-05T09:09:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.