Related papers: One-shot Policy Elicitation via Semantic Reward Manipulation

One-shot Policy Elicitation via Semantic Reward Manipulation

URL: http://arxiv.org/abs/2101.01860v1
Date: Wed, 6 Jan 2021 04:11:22 GMT
Title: One-shot Policy Elicitation via Semantic Reward Manipulation
Authors: Aaquib Tabrez, Ryan Leonard, Bradley Hayes
Abstract summary: We present Single-shot Policy Explanation for Augmenting Rewards (SPEAR), a novel sequential optimization algorithm. We show that SPEAR makes substantial improvements over the state-of-the-art in terms of runtime and addressable problem size.
Score: 2.668480521943575
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Synchronizing expectations and knowledge about the state of the world is an essential capability for effective collaboration. For robots to effectively collaborate with humans and other autonomous agents, it is critical that they be able to generate intelligible explanations to reconcile differences between their understanding of the world and that of their collaborators. In this work we present Single-shot Policy Explanation for Augmenting Rewards (SPEAR), a novel sequential optimization algorithm that uses semantic explanations derived from combinations of planning predicates to augment agents' reward functions, driving their policies to exhibit more optimal behavior. We provide an experimental validation of our algorithm's policy manipulation capabilities in two practically grounded applications and conclude with a performance analysis of SPEAR on domains of increasingly complex state space and predicate counts. We demonstrate that our method makes substantial improvements over the state-of-the-art in terms of runtime and addressable problem size, enabling an agent to leverage its own expertise to communicate actionable information to improve another's performance.

Related papers

Evaluations at Work: Measuring the Capabilities of GenAI in Use [28.124088786766965]
Current AI benchmarks miss the messy, multi-turn nature of human-AI collaboration.<n>We present an evaluation framework that decomposes real-world tasks into interdependent subtasks.
arXiv Detail & Related papers (2025-05-15T23:06:23Z)
Efficient Policy Adaptation with Contrastive Prompt Ensemble for Embodied Agents [6.402396836189286]
We present a novel contrastive prompt ensemble (ConPE) framework for embodied reinforcement learning. We devise a guided-attention-based ensemble approach with multiple visual prompts on the vision-language model to construct robust state representations. In experiments, we show that ConPE outperforms other state-of-the-art algorithms for several embodied agent tasks.
arXiv Detail & Related papers (2024-12-16T06:53:00Z)
From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process. We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z)
Communication Learning in Multi-Agent Systems from Graph Modeling Perspective [62.13508281188895]
We introduce a novel approach wherein we conceptualize the communication architecture among agents as a learnable graph. We introduce a temporal gating mechanism for each agent, enabling dynamic decisions on whether to receive shared information at a given time.
arXiv Detail & Related papers (2024-11-01T05:56:51Z)
Reinforcing Language Agents via Policy Optimization with Action Decomposition [36.984163245259936]
This paper proposes decomposing language agent optimization from the action level to the token level. We then derive the Bellman backup with Action Decomposition (BAD) to integrate credit assignments for both intra-action and inter-action tokens. Implementing BAD within the PPO algorithm, we introduce Policy Optimization with Action Decomposition (POAD)
arXiv Detail & Related papers (2024-05-23T14:01:44Z)
Learning Multi-Agent Communication from Graph Modeling Perspective [62.13508281188895]
We introduce a novel approach wherein we conceptualize the communication architecture among agents as a learnable graph. Our proposed approach, CommFormer, efficiently optimize the communication graph and concurrently refines architectural parameters through gradient descent in an end-to-end manner.
arXiv Detail & Related papers (2024-05-14T12:40:25Z)
Decentralized and Lifelong-Adaptive Multi-Agent Collaborative Learning [57.652899266553035]
Decentralized and lifelong-adaptive multi-agent collaborative learning aims to enhance collaboration among multiple agents without a central server. We propose DeLAMA, a decentralized multi-agent lifelong collaborative learning algorithm with dynamic collaboration graphs.
arXiv Detail & Related papers (2024-03-11T09:21:11Z)
Iterated Reasoning with Mutual Information in Cooperative and Byzantine Decentralized Teaming [0.0]
We show that reformulating an agent's policy to be conditional on the policies of its teammates inherently maximizes Mutual Information (MI) lower-bound when optimizing under Policy Gradient (PG) Our approach, InfoPG, outperforms baselines in learning emergent collaborative behaviors and sets the state-of-the-art in decentralized cooperative MARL tasks.
arXiv Detail & Related papers (2022-01-20T22:54:32Z)
Distributed Adaptive Learning Under Communication Constraints [54.22472738551687]
This work examines adaptive distributed learning strategies designed to operate under communication constraints. We consider a network of agents that must solve an online optimization problem from continual observation of streaming data.
arXiv Detail & Related papers (2021-12-03T19:23:48Z)
Decentralized Multi-Agent Reinforcement Learning: An Off-Policy Method [6.261762915564555]
We discuss the problem of decentralized multi-agent reinforcement learning (MARL) in this work. In our setting, the global state, action, and reward are assumed to be fully observable, while the local policy is protected as privacy by each agent, and thus cannot be shared with others. The policy evaluation and policy improvement algorithms are designed for discrete and continuous state-action-space Markov Decision Process (MDP) respectively.
arXiv Detail & Related papers (2021-10-31T09:08:46Z)
APS: Active Pretraining with Successor Features [96.24533716878055]
We show that by reinterpreting and combining successorcitepHansenFast with non entropy, the intractable mutual information can be efficiently optimized. The proposed method Active Pretraining with Successor Feature (APS) explores the environment via non entropy, and the explored data can be efficiently leveraged to learn behavior.
arXiv Detail & Related papers (2021-08-31T16:30:35Z)
Domain-Robust Visual Imitation Learning with Mutual Information Constraints [0.0]
We introduce a new algorithm called Disentangling Generative Adversarial Imitation Learning (DisentanGAIL) Our algorithm enables autonomous agents to learn directly from high dimensional observations of an expert performing a task.
arXiv Detail & Related papers (2021-03-08T21:18:58Z)
"I Don't Think So": Disagreement-Based Policy Summaries for Comparing Agents [2.6270468656705765]
We propose a novel method for generating contrastive summaries that highlight the differences between agent's policies. Our results show that the novel disagreement-based summaries lead to improved user performance compared to summaries generated using HIGHLIGHTS.
arXiv Detail & Related papers (2021-02-05T09:09:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.