Conformal Off-Policy Prediction for Multi-Agent Systems
- URL: http://arxiv.org/abs/2403.16871v2
- Date: Sun, 15 Sep 2024 17:03:26 GMT
- Title: Conformal Off-Policy Prediction for Multi-Agent Systems
- Authors: Tom Kuipers, Renukanandan Tumu, Shuo Yang, Milad Kazemi, Rahul Mangharam, Nicola Paoletti,
- Abstract summary: Off-Policy Prediction (OPP) is a paramount problem in data-driven analysis of safety-critical systems.
We introduce MA-COPP, the first conformal prediction method to solve OPP problems involving multi-agent systems.
A key contribution of MA-COPP is to avoid enumeration or exhaustive search of the output space of agent trajectories.
- Score: 6.32674891108819
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Off-Policy Prediction (OPP), i.e., predicting the outcomes of a target policy using only data collected under a nominal (behavioural) policy, is a paramount problem in data-driven analysis of safety-critical systems where the deployment of a new policy may be unsafe. To achieve dependable off-policy predictions, recent work on Conformal Off-Policy Prediction (COPP) leverage the conformal prediction framework to derive prediction regions with probabilistic guarantees under the target process. Existing COPP methods can account for the distribution shifts induced by policy switching, but are limited to single-agent systems and scalar outcomes (e.g., rewards). In this work, we introduce MA-COPP, the first conformal prediction method to solve OPP problems involving multi-agent systems, deriving joint prediction regions for all agents' trajectories when one or more ego agents change their policies. Unlike the single-agent scenario, this setting introduces higher complexity as the distribution shifts affect predictions for all agents, not just the ego agents, and the prediction task involves full multi-dimensional trajectories, not just reward values. A key contribution of MA-COPP is to avoid enumeration or exhaustive search of the output space of agent trajectories, which is instead required by existing COPP methods to construct the prediction region. We achieve this by showing that an over-approximation of the true joint prediction region (JPR) can be constructed, without enumeration, from the maximum density ratio of the JPR trajectories. We evaluate the effectiveness of MA-COPP in multi-agent systems from the PettingZoo library and the F1TENTH autonomous racing environment, achieving nominal coverage in higher dimensions and various shift settings.
Related papers
- Centralized Adaptive Sampling for Reliable Co-Training of Independent Multi-Agent Policies [3.5253513747455303]
Independent on-policy policy gradient algorithms are widely used for multi-agent reinforcement learning (MARL) in cooperative and no-conflict games.<n>They are known to converge suboptimally when each agent's policy gradient points toward a suboptimal equilibrium.<n>We introduce an adaptive action sampling approach to reduce joint sampling error.
arXiv Detail & Related papers (2025-08-01T20:07:25Z) - Distributionally Robust Performative Prediction [25.580721293862467]
We introduce a novel framework of distributionally robust performative prediction and study a new solution concept termed as distributionally robust performative optimum (O)
We show provable guarantees for DRPO as a robust approximation to the true PO when the nominal distribution map is different from the actual one.
Results demonstrate that DRPO offers potential advantages over traditional PO approach when the distribution map is misspecified at either micro- or macro-level.
arXiv Detail & Related papers (2024-12-05T17:05:49Z) - MAP-Former: Multi-Agent-Pair Gaussian Joint Prediction [6.110153599741102]
There is a gap in risk assessment of trajectories between the trajectory information coming from a traffic motion prediction module and what is actually needed.
Existing prediction models yield joint predictions of agents' future trajectories with uncertainty weights or marginal Gaussian probability density functions (PDFs) for single agents.
This paper introduces a novel approach to motion prediction, focusing on predicting agent-pair covariance matrices in a scene-centric'' manner.
arXiv Detail & Related papers (2024-04-30T06:21:42Z) - Whom to Trust? Elective Learning for Distributed Gaussian Process
Regression [3.5208783730894972]
We develop an elective learning algorithm, namely prior-aware elective distributed GP (Pri-GP)
Pri-GP empowers agents with the capability to selectively request predictions from neighboring agents based on their trustworthiness.
We establish a prediction error bound within the Pri-GP framework, ensuring the reliability of predictions.
arXiv Detail & Related papers (2024-02-05T13:52:56Z) - Off-Policy Evaluation for Large Action Spaces via Policy Convolution [60.6953713877886]
Policy Convolution family of estimators uses latent structure within actions to strategically convolve the logging and target policies.
Experiments on synthetic and benchmark datasets demonstrate remarkable mean squared error (MSE) improvements when using PC.
arXiv Detail & Related papers (2023-10-24T01:00:01Z) - Model-Free Learning and Optimal Policy Design in Multi-Agent MDPs Under Probabilistic Agent Dropout [4.949881799107061]
This work studies a multi-agent Markov decision process (MDP) that can undergo agent dropout and the computation of policies for the post-dropout system.
We first show that the value of the expected post-dropout system can be represented by a single MDP.
More significantly, in a model-free context, it is shown that the robust MDP value can be estimated with samples generated by the pre-dropout system.
arXiv Detail & Related papers (2023-04-24T21:29:41Z) - Collaborative Uncertainty Benefits Multi-Agent Multi-Modal Trajectory Forecasting [61.02295959343446]
This work first proposes a novel concept, collaborative uncertainty (CU), which models the uncertainty resulting from interaction modules.
We build a general CU-aware regression framework with an original permutation-equivariant uncertainty estimator to do both tasks of regression and uncertainty estimation.
We apply the proposed framework to current SOTA multi-agent trajectory forecasting systems as a plugin module.
arXiv Detail & Related papers (2022-07-11T21:17:41Z) - Conformal Prediction Intervals for Markov Decision Process Trajectories [10.68332392039368]
This paper provides conformal prediction intervals over the future behavior of an autonomous system executing a fixed control policy on a Markov Decision Process (MDP)
The method is illustrated on MDPs for invasive species management and StarCraft2 battles.
arXiv Detail & Related papers (2022-06-10T03:43:53Z) - Monotonic Improvement Guarantees under Non-stationarity for
Decentralized PPO [66.5384483339413]
We present a new monotonic improvement guarantee for optimizing decentralized policies in cooperative Multi-Agent Reinforcement Learning (MARL)
We show that a trust region constraint can be effectively enforced in a principled way by bounding independent ratios based on the number of agents in training.
arXiv Detail & Related papers (2022-01-31T20:39:48Z) - Test-time Collective Prediction [73.74982509510961]
Multiple parties in machine learning want to jointly make predictions on future test points.
Agents wish to benefit from the collective expertise of the full set of agents, but may not be willing to release their data or model parameters.
We explore a decentralized mechanism to make collective predictions at test time, leveraging each agent's pre-trained model.
arXiv Detail & Related papers (2021-06-22T18:29:58Z) - Dealing with Non-Stationarity in Multi-Agent Reinforcement Learning via
Trust Region Decomposition [52.06086375833474]
Non-stationarity is one thorny issue in multi-agent reinforcement learning.
We introduce a $delta$-stationarity measurement to explicitly model the stationarity of a policy sequence.
We propose a trust region decomposition network based on message passing to estimate the joint policy divergence.
arXiv Detail & Related papers (2021-02-21T14:46:50Z) - Implicit Distributional Reinforcement Learning [61.166030238490634]
implicit distributional actor-critic (IDAC) built on two deep generator networks (DGNs)
Semi-implicit actor (SIA) powered by a flexible policy distribution.
We observe IDAC outperforms state-of-the-art algorithms on representative OpenAI Gym environments.
arXiv Detail & Related papers (2020-07-13T02:52:18Z) - Model-based Reinforcement Learning for Decentralized Multiagent
Rendezvous [66.6895109554163]
Underlying the human ability to align goals with other agents is their ability to predict the intentions of others and actively update their own plans.
We propose hierarchical predictive planning (HPP), a model-based reinforcement learning method for decentralized multiagent rendezvous.
arXiv Detail & Related papers (2020-03-15T19:49:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.