Related papers: Towards Optimal Performance and Action Consistency Guarantees in Dec-POMDPs with Inconsistent Beliefs and Limited Communication

Towards Optimal Performance and Action Consistency Guarantees in Dec-POMDPs with Inconsistent Beliefs and Limited Communication

URL: http://arxiv.org/abs/2512.20778v1
Date: Tue, 23 Dec 2025 21:25:53 GMT
Title: Towards Optimal Performance and Action Consistency Guarantees in Dec-POMDPs with Inconsistent Beliefs and Limited Communication
Authors: Moshe Rafaeli Shimron, Vadim Indelman,
Abstract summary: Multi-agent decision-making under uncertainty is fundamental for effective and safe autonomous operation.<n>Most existing approaches assume that all agents have identical beliefs at planning time, implying these beliefs are conditioned on the same data.<n>We introduce a novel decentralized framework for optimal joint action selection that explicitly accounts for belief inconsistencies.
Score: 9.269394037577177
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multi-agent decision-making under uncertainty is fundamental for effective and safe autonomous operation. In many real-world scenarios, each agent maintains its own belief over the environment and must plan actions accordingly. However, most existing approaches assume that all agents have identical beliefs at planning time, implying these beliefs are conditioned on the same data. Such an assumption is often impractical due to limited communication. In reality, agents frequently operate with inconsistent beliefs, which can lead to poor coordination and suboptimal, potentially unsafe, performance. In this paper, we address this critical challenge by introducing a novel decentralized framework for optimal joint action selection that explicitly accounts for belief inconsistencies. Our approach provides probabilistic guarantees for both action consistency and performance with respect to open-loop multi-agent POMDP (which assumes all data is always communicated), and selectively triggers communication only when needed. Furthermore, we address another key aspect of whether, given a chosen joint action, the agents should share data to improve expected performance in inference. Simulation results show our approach outperforms state-of-the-art algorithms.

Related papers

BEDA: Belief Estimation as Probabilistic Constraints for Performing Strategic Dialogue Acts [69.7477004390211]
We formalize two core acts Adversarial and Alignment, and operationalize them via probabilistic constraints on what an agent may generate.<n>We instantiate this idea in BEDA, a framework that consists of the world set, the belief estimator for belief estimation, and the conditional generator that selects acts and realizes utterances consistent with the inferred beliefs.
arXiv Detail & Related papers (2025-12-31T14:26:55Z)
Belief-Calibrated Multi-Agent Consensus Seeking for Complex NLP Tasks [45.14284473132228]
We provide a theoretical framework for selecting optimal collaborators that maximize consensus stability.<n>Based on the theorems, we propose the Belief-Calibrated Consensus Seeking (BCCS) framework to facilitate stable consensus.<n> Experimental results on the MATH and MMLU benchmark datasets demonstrate that the proposed BCCS framework outperforms the best existing results.
arXiv Detail & Related papers (2025-10-07T17:53:34Z)
A Principled Approach to Randomized Selection under Uncertainty: Applications to Peer Review and Grant Funding [61.86327960322782]
We propose a principled framework for randomized decision-making based on interval estimates of the quality of each item.<n>We introduce MERIT, an optimization-based method that maximizes the worst-case expected number of top candidates selected.<n>We prove that MERIT satisfies desirable axiomatic properties not guaranteed by existing approaches.
arXiv Detail & Related papers (2025-06-23T19:59:30Z)
Collaborative Value Function Estimation Under Model Mismatch: A Federated Temporal Difference Analysis [55.13545823385091]
Federated reinforcement learning (FedRL) enables collaborative learning while preserving data privacy by preventing direct data exchange between agents.<n>In real-world applications, each agent may experience slightly different transition dynamics, leading to inherent model mismatches.<n>We show that even moderate levels of information sharing significantly mitigate environment-specific errors.
arXiv Detail & Related papers (2025-03-21T18:06:28Z)
Criticality and Safety Margins for Reinforcement Learning [53.10194953873209]
We seek to define a criticality framework with both a quantifiable ground truth and a clear significance to users.<n>We introduce true criticality as the expected drop in reward when an agent deviates from its policy for n consecutive random actions.<n>We also introduce the concept of proxy criticality, a low-overhead metric that has a statistically monotonic relationship to true criticality.
arXiv Detail & Related papers (2024-09-26T21:00:45Z)
SAFE-SIM: Safety-Critical Closed-Loop Traffic Simulation with Diffusion-Controllable Adversaries [94.84458417662407]
We introduce SAFE-SIM, a controllable closed-loop safety-critical simulation framework. Our approach yields two distinct advantages: 1) generating realistic long-tail safety-critical scenarios that closely reflect real-world conditions, and 2) providing controllable adversarial behavior for more comprehensive and interactive evaluations. We validate our framework empirically using the nuScenes and nuPlan datasets across multiple planners, demonstrating improvements in both realism and controllability.
arXiv Detail & Related papers (2023-12-31T04:14:43Z)
Achieving Fairness in Multi-Agent Markov Decision Processes Using Reinforcement Learning [30.605881670761853]
We propose a Reinforcement Learning approach to achieve fairness in finite-horizon episodic MDPs. We show that such an approach achieves sub-linear regret in terms of the number of episodes.
arXiv Detail & Related papers (2023-06-01T03:43:53Z)
Safe Explicable Planning [3.3869539907606603]
We propose Safe Explicable Planning (SEP) to support the specification of a safety bound. Our approach generalizes the consideration of multiple objectives stemming from multiple models. We provide formal proofs that validate the desired theoretical properties of these methods.
arXiv Detail & Related papers (2023-04-04T21:49:02Z)
Distributed Adaptive Learning Under Communication Constraints [54.22472738551687]
This work examines adaptive distributed learning strategies designed to operate under communication constraints. We consider a network of agents that must solve an online optimization problem from continual observation of streaming data.
arXiv Detail & Related papers (2021-12-03T19:23:48Z)
DSDF: An approach to handle stochastic agents in collaborative multi-agent reinforcement learning [0.0]
We show how thisity of agents, which could be a result of malfunction or aging of robots, can add to the uncertainty in coordination. Our solution, DSDF which tunes the discounted factor for the agents according to uncertainty and use the values to update the utility networks of individual agents.
arXiv Detail & Related papers (2021-09-14T12:02:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.