Related papers: Learning to Steer Markovian Agents under Model Uncertainty

Learning to Steer Markovian Agents under Model Uncertainty

URL: http://arxiv.org/abs/2407.10207v1
Date: Sun, 14 Jul 2024 14:01:38 GMT
Title: Learning to Steer Markovian Agents under Model Uncertainty
Authors: Jiawei Huang, Vinzenz Thoma, Zebang Shen, Heinrich H. Nax, Niao He,
Abstract summary: We introduce a model-based non-episodic Reinforcement Learning (RL) formulation for our steering problem. We focus on learning a emphhistory-dependent steering strategy to handle the inherent model uncertainty about the agents' learning dynamics.
Score: 23.603487812521657
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Designing incentives for an adapting population is a ubiquitous problem in a wide array of economic applications and beyond. In this work, we study how to design additional rewards to steer multi-agent systems towards desired policies \emph{without} prior knowledge of the agents' underlying learning dynamics. We introduce a model-based non-episodic Reinforcement Learning (RL) formulation for our steering problem. Importantly, we focus on learning a \emph{history-dependent} steering strategy to handle the inherent model uncertainty about the agents' learning dynamics. We introduce a novel objective function to encode the desiderata of achieving a good steering outcome with reasonable cost. Theoretically, we identify conditions for the existence of steering strategies to guide agents to the desired policies. Complementing our theoretical contributions, we provide empirical algorithms to approximately solve our objective, which effectively tackles the challenge in learning history-dependent strategies. We demonstrate the efficacy of our algorithms through empirical evaluations.

Related papers

Learning to Lead: Incentivizing Strategic Agents in the Dark [50.93875404941184]
We study an online learning version of the generalized principal-agent model.<n>We develop the first provably sample-efficient algorithm for this challenging setting.<n>We establish a near optimal $tildeO(sqrtT) $ regret bound for learning the principal's optimal policy.
arXiv Detail & Related papers (2025-06-10T04:25:04Z)
Strategy Masking: A Method for Guardrails in Value-based Reinforcement Learning Agents [0.27309692684728604]
We study methods for constructing guardrails for AI agents that use reward functions to learn decision making. We introduce a novel approach, which we call strategy masking, to explicitly learn and then suppress undesirable AI agent behavior.
arXiv Detail & Related papers (2025-01-09T18:43:05Z)
Safe Multi-agent Learning via Trapping Regions [89.24858306636816]
We apply the concept of trapping regions, known from qualitative theory of dynamical systems, to create safety sets in the joint strategy space for decentralized learning. We propose a binary partitioning algorithm for verification that candidate sets form trapping regions in systems with known learning dynamics, and a sampling algorithm for scenarios where learning dynamics are not known.
arXiv Detail & Related papers (2023-02-27T14:47:52Z)
Intrinsic Motivation in Model-based Reinforcement Learning: A Brief Review [77.34726150561087]
This review considers the existing methods for determining intrinsic motivation based on the world model obtained by the agent. The proposed unified framework describes the architecture of agents using a world model and intrinsic motivation to improve learning.
arXiv Detail & Related papers (2023-01-24T15:13:02Z)
Option-Aware Adversarial Inverse Reinforcement Learning for Robotic Control [44.77500987121531]
Hierarchical Imitation Learning (HIL) has been proposed to recover highly-complex behaviors in long-horizon tasks from expert demonstrations. We develop a novel HIL algorithm based on Adversarial Inverse Reinforcement Learning. We also propose a Variational Autoencoder framework for learning with our objectives in an end-to-end fashion.
arXiv Detail & Related papers (2022-10-05T00:28:26Z)
Learning to Find Proofs and Theorems by Learning to Refine Search Strategies [0.9137554315375919]
An AlphaZero-style agent is self-training to refine a high-level expert strategy expressed as a nondeterministic program. An analogous teacher agent is self-training to generate tasks of suitable relevance and difficulty for the learner.
arXiv Detail & Related papers (2022-05-27T20:48:40Z)
Rethinking Learning Dynamics in RL using Adversarial Networks [79.56118674435844]
We present a learning mechanism for reinforcement learning of closely related skills parameterized via a skill embedding space. The main contribution of our work is to formulate an adversarial training regime for reinforcement learning with the help of entropy-regularized policy gradient formulation.
arXiv Detail & Related papers (2022-01-27T19:51:09Z)
Generalized dynamic cognitive hierarchy models for strategic driving behavior [13.415452801139843]
We develop a framework of generalized dynamic cognitive hierarchy for both modelling naturalistic human driving behavior and behavior planning for autonomous vehicles. Based on evaluation on two large naturalistic datasets, we show that automata strategies are well suited for level-0 behavior in a dynamic level-k framework.
arXiv Detail & Related papers (2021-09-20T21:49:52Z)
Deep Reinforcement Learning in a Monetary Model [5.7742249974375985]
We propose using deep reinforcement learning to solve dynamic general equilibrium models. Agents are represented by deep artificial neural networks and learn to solve their dynamic optimisation problem. We find that, contrary to adaptive learning, the artificially intelligent household can solve the model in all policy regimes.
arXiv Detail & Related papers (2021-04-19T14:56:44Z)
Model-based Meta Reinforcement Learning using Graph Structured Surrogate Models [40.08137765886609]
We show that our model, called a graph structured surrogate model (GSSM), outperforms state-of-the-art methods in predicting environment dynamics. Our approach is able to obtain high returns, while allowing fast execution during deployment by avoiding test time policy gradient optimization.
arXiv Detail & Related papers (2021-02-16T17:21:55Z)
Goal-Aware Prediction: Learning to Model What Matters [105.43098326577434]
One of the fundamental challenges in using a learned forward dynamics model is the mismatch between the objective of the learned model and that of the downstream planner or policy. We propose to direct prediction towards task relevant information, enabling the model to be aware of the current task and encouraging it to only model relevant quantities of the state space. We find that our method more effectively models the relevant parts of the scene conditioned on the goal, and as a result outperforms standard task-agnostic dynamics models and model-free reinforcement learning.
arXiv Detail & Related papers (2020-07-14T16:42:59Z)
Meta-Reinforcement Learning Robust to Distributional Shift via Model Identification and Experience Relabeling [126.69933134648541]
We present a meta-reinforcement learning algorithm that is both efficient and extrapolates well when faced with out-of-distribution tasks at test time. Our method is based on a simple insight: we recognize that dynamics models can be adapted efficiently and consistently with off-policy data.
arXiv Detail & Related papers (2020-06-12T13:34:46Z)
Information Theoretic Model Predictive Q-Learning [64.74041985237105]
We present a novel theoretical connection between information theoretic MPC and entropy regularized RL. We develop a Q-learning algorithm that can leverage biased models.
arXiv Detail & Related papers (2019-12-31T00:29:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.