Related papers: R2PS: Worst-Case Robust Real-Time Pursuit Strategies under Partial Observability

R2PS: Worst-Case Robust Real-Time Pursuit Strategies under Partial Observability

URL: http://arxiv.org/abs/2511.17367v1
Date: Fri, 21 Nov 2025 16:34:00 GMT
Title: R2PS: Worst-Case Robust Real-Time Pursuit Strategies under Partial Observability
Authors: Runyu Lu, Ruochuan Shi, Yuanheng Zhu, Dongbin Zhao,
Abstract summary: This paper introduces the first approach to worst-case robust real-time pursuit strategies (R2PS) under partial observability.<n>We first prove that a traditional dynamic programming (DP) algorithm for solving Markov PEGs maintains optimality under the asynchronous moves by the evader.<n>We then propose a belief preservation mechanism about the evader's possible positions, extending the DP pursuit strategies to a partially observable setting.
Score: 25.176860778665173
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Computing worst-case robust strategies in pursuit-evasion games (PEGs) is time-consuming, especially when real-world factors like partial observability are considered. While important for general security purposes, real-time applicable pursuit strategies for graph-based PEGs are currently missing when the pursuers only have imperfect information about the evader's position. Although state-of-the-art reinforcement learning (RL) methods like Equilibrium Policy Generalization (EPG) and Grasper provide guidelines for learning graph neural network (GNN) policies robust to different game dynamics, they are restricted to the scenario of perfect information and do not take into account the possible case where the evader can predict the pursuers' actions. This paper introduces the first approach to worst-case robust real-time pursuit strategies (R2PS) under partial observability. We first prove that a traditional dynamic programming (DP) algorithm for solving Markov PEGs maintains optimality under the asynchronous moves by the evader. Then, we propose a belief preservation mechanism about the evader's possible positions, extending the DP pursuit strategies to a partially observable setting. Finally, we embed the belief preservation into the state-of-the-art EPG framework to finish our R2PS learning scheme, which leads to a real-time pursuer policy through cross-graph reinforcement learning against the asynchronous-move DP evasion strategies. After reinforcement learning, our policy achieves robust zero-shot generalization to unseen real-world graph structures and consistently outperforms the policy directly trained on the test graphs by the existing game RL approach.

Related papers

Implicit Strategic Optimization: Rethinking Long-Horizon Decision-Making in Adversarial Poker Environments [9.732494293258828]
Implicit Strategic Optimization is a prediction-aware framework for training large language model (LLM) agents for adversarial games.<n>We prove sublinear contextual regret and equilibrium convergence guarantees whose dominant terms scale with the number of context mispredictions.<n>Experiments in 6-player No-Limit Texas Hold'em and competitive Pokemon show consistent improvements in long-term return.
arXiv Detail & Related papers (2026-02-08T16:17:46Z)
RN-D: Discretized Categorical Actors with Regularized Networks for On-Policy Reinforcement Learning [27.45103393884625]
We revisit policy representation as a first-class design choice for on-policy optimization.<n>We study discretized categorical actors that represent each action dimension with a distribution over bins, yielding a policy objective that resembles a cross-entropy loss.<n>Our results show that simply replacing the standard actor network with our discretized regularized actor yields consistent gains.
arXiv Detail & Related papers (2026-01-30T15:24:34Z)
Decentralized Learning Strategies for Estimation Error Minimization with Graph Neural Networks [86.99017195607077]
We address real-time sampling and estimation of autoregressive Markovian sources in wireless networks.<n>We propose a graphical reinforcement learning framework for policy optimization.<n>Theoretically, our proposed policies are transferable, allowing a policy trained on one graph to be effectively applied to structurally similar graphs.
arXiv Detail & Related papers (2026-01-19T02:18:45Z)
Equilibrium Policy Generalization: A Reinforcement Learning Framework for Cross-Graph Zero-Shot Generalization in Pursuit-Evasion Games [38.70408341845241]
Pursuit-evasion game (PEG) is an important class of real-world games from the fields of robotics and security.<n>This paper proposes an Equilibrium Policy Generalization framework to learn a generalized policy with robust cross-graph zero-shot performance.<n> Experimental results show that, using equilibrium guidance and a distance feature proposed for cross-graph PEG training, the EPG framework guarantees desirable zero-shot performance.
arXiv Detail & Related papers (2025-11-02T05:45:27Z)
Fast and the Furious: Hot Starts in Pursuit-Evasion Games [0.0]
A novel approach that combines game-theoretic control theory with Graph Neural Networks is introduced in this work.<n>By conceptualizing pursuer configurations as strategic arrangements and representing them as graphs, a Graph Characteristic Space is constructed.<n>A Graph Convolutional Network (GCN) is trained to generate strategically effective initial configurations, termed "hot starts"
arXiv Detail & Related papers (2025-10-12T22:46:50Z)
Group-Relative REINFORCE Is Secretly an Off-Policy Algorithm: Demystifying Some Myths About GRPO and Its Friends [64.71326476563213]
Off-policy reinforcement learning for large language models (LLMs) is attracting growing interest.<n>We present a first-principles derivation for grouprelative REINFORCE without assuming a specific training data distribution.<n>This perspective yields two general principles for adapting REINFORCE to off-policy settings.
arXiv Detail & Related papers (2025-09-29T02:34:54Z)
Learning Deterministic Policies with Policy Gradients in Constrained Markov Decision Processes [59.27926064817273]
We introduce an exploration-agnostic algorithm, called C-PG, which enjoys global last-iterate convergence guarantees under domination assumptions.<n>We empirically validate both the action-based (C-PGAE) and parameter-based (C-PGPE) variants of C-PG on constrained control tasks.
arXiv Detail & Related papers (2025-06-06T10:29:05Z)
Reinforcement Learning for Game-Theoretic Resource Allocation on Graphs [9.369330148791201]
Game-theoretic resource allocation on graphs (GRAG) is a problem modeled as a multi-step Colonel Blotto Game (MCBG)<n>We formulate the MCBG as a Markov Decision Process (MDP) and apply Reinforcement Learning (RL) methods, specifically Deep Q-Network (DQN) and Proximal Policy Optimization (PPO)<n>We evaluate RL performance across a variety of graph structures and initial resource distributions, comparing against random, greedy, and learned RL policies.
arXiv Detail & Related papers (2025-05-08T21:12:34Z)
Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning [62.81324245896717]
We introduce an exploration-agnostic algorithm, called C-PG, which exhibits global last-ite convergence guarantees under (weak) gradient domination assumptions. We numerically validate our algorithms on constrained control problems, and compare them with state-of-the-art baselines.
arXiv Detail & Related papers (2024-07-15T14:54:57Z)
Grasper: A Generalist Pursuer for Pursuit-Evasion Problems [36.115954360950134]
Pursuit-evasion games (PEGs) model interactions between a team of pursuers and an evader in graph-based environments. Recent advancements have demonstrated the effectiveness of the pre-training and fine-tuning paradigm in PSRO. We introduce Grasper, a GeneRAlist purSuer for Pursuit-Evasion pRoblems, capable of efficiently generating pursuer policies tailored to specific PEGs.
arXiv Detail & Related papers (2024-04-19T04:54:38Z)
On the Adversarial Robustness of Graph Contrastive Learning Methods [9.675856264585278]
We introduce a comprehensive evaluation robustness protocol tailored to assess the robustness of graph contrastive learning (GCL) models. We subject these models to adaptive adversarial attacks targeting the graph structure, specifically in the evasion scenario. With our work, we aim to offer insights into the robustness of GCL methods and hope to open avenues for potential future research directions.
arXiv Detail & Related papers (2023-11-29T17:59:18Z)
Projective Ranking-based GNN Evasion Attacks [52.85890533994233]
Graph neural networks (GNNs) offer promising learning methods for graph-related tasks. GNNs are at risk of adversarial attacks.
arXiv Detail & Related papers (2022-02-25T21:52:09Z)
Policy Smoothing for Provably Robust Reinforcement Learning [109.90239627115336]
We study the provable robustness of reinforcement learning against norm-bounded adversarial perturbations of the inputs. We generate certificates that guarantee that the total reward obtained by the smoothed policy will not fall below a certain threshold under a norm-bounded adversarial of perturbation the input.
arXiv Detail & Related papers (2021-06-21T21:42:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.