Related papers: Grasper: A Generalist Pursuer for Pursuit-Evasion Problems

Grasper: A Generalist Pursuer for Pursuit-Evasion Problems

URL: http://arxiv.org/abs/2404.12626v1
Date: Fri, 19 Apr 2024 04:54:38 GMT
Title: Grasper: A Generalist Pursuer for Pursuit-Evasion Problems
Authors: Pengdeng Li, Shuxin Li, Xinrun Wang, Jakub Cerny, Youzhi Zhang, Stephen McAleer, Hau Chan, Bo An,
Abstract summary: Pursuit-evasion games (PEGs) model interactions between a team of pursuers and an evader in graph-based environments. Recent advancements have demonstrated the effectiveness of the pre-training and fine-tuning paradigm in PSRO. We introduce Grasper, a GeneRAlist purSuer for Pursuit-Evasion pRoblems, capable of efficiently generating pursuer policies tailored to specific PEGs.
Score: 36.115954360950134
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Pursuit-evasion games (PEGs) model interactions between a team of pursuers and an evader in graph-based environments such as urban street networks. Recent advancements have demonstrated the effectiveness of the pre-training and fine-tuning paradigm in PSRO to improve scalability in solving large-scale PEGs. However, these methods primarily focus on specific PEGs with fixed initial conditions that may vary substantially in real-world scenarios, which significantly hinders the applicability of the traditional methods. To address this issue, we introduce Grasper, a GeneRAlist purSuer for Pursuit-Evasion pRoblems, capable of efficiently generating pursuer policies tailored to specific PEGs. Our contributions are threefold: First, we present a novel architecture that offers high-quality solutions for diverse PEGs, comprising critical components such as (i) a graph neural network (GNN) to encode PEGs into hidden vectors, and (ii) a hypernetwork to generate pursuer policies based on these hidden vectors. As a second contribution, we develop an efficient three-stage training method involving (i) a pre-pretraining stage for learning robust PEG representations through self-supervised graph learning techniques like GraphMAE, (ii) a pre-training stage utilizing heuristic-guided multi-task pre-training (HMP) where heuristic-derived reference policies (e.g., through Dijkstra's algorithm) regularize pursuer policies, and (iii) a fine-tuning stage that employs PSRO to generate pursuer policies on designated PEGs. Finally, we perform extensive experiments on synthetic and real-world maps, showcasing Grasper's significant superiority over baselines in terms of solution quality and generalizability. We demonstrate that Grasper provides a versatile approach for solving pursuit-evasion problems across a broad range of scenarios, enabling practical deployment in real-world situations.

Related papers

R2PS: Worst-Case Robust Real-Time Pursuit Strategies under Partial Observability [25.176860778665173]
This paper introduces the first approach to worst-case robust real-time pursuit strategies (R2PS) under partial observability.<n>We first prove that a traditional dynamic programming (DP) algorithm for solving Markov PEGs maintains optimality under the asynchronous moves by the evader.<n>We then propose a belief preservation mechanism about the evader's possible positions, extending the DP pursuit strategies to a partially observable setting.
arXiv Detail & Related papers (2025-11-21T16:34:00Z)
Equilibrium Policy Generalization: A Reinforcement Learning Framework for Cross-Graph Zero-Shot Generalization in Pursuit-Evasion Games [38.70408341845241]
Pursuit-evasion game (PEG) is an important class of real-world games from the fields of robotics and security.<n>This paper proposes an Equilibrium Policy Generalization framework to learn a generalized policy with robust cross-graph zero-shot performance.<n> Experimental results show that, using equilibrium guidance and a distance feature proposed for cross-graph PEG training, the EPG framework guarantees desirable zero-shot performance.
arXiv Detail & Related papers (2025-11-02T05:45:27Z)
Polychromic Objectives for Reinforcement Learning [63.37185057794815]
Reinforcement learning fine-tuning (RLFT) is a dominant paradigm for improving pretrained policies for downstream tasks.<n>We introduce an objective for policy methods that explicitly enforces the exploration and refinement of diverse generations.<n>We show how proximal policy optimization (PPO) can be adapted to optimize this objective.
arXiv Detail & Related papers (2025-09-29T19:32:11Z)
HGMP:Heterogeneous Graph Multi-Task Prompt Learning [18.703129208282913]
We propose a novel multi-task prompt framework for the heterogeneous graph domain, named HGMP.<n>First, to bridge the gap between the pre-trained model and downstream tasks, we reformulate all downstream tasks into a unified graph-level task format.<n>We design a graph-level contrastive pre-training strategy to better leverage heterogeneous information and enhance performance in multi-task scenarios.
arXiv Detail & Related papers (2025-07-10T04:01:47Z)
GFlowGR: Fine-tuning Generative Recommendation Frameworks with Generative Flow Networks [36.39484385717512]
This paper treats the Generative Recommendations (GR) as a multi-step generation task and constructs a GFlowNets-based fine-tuning framework (GFlowGR)<n>The proposed framework integrates collaborative knowledge from traditional recommender systems to create an adaptive trajectory sampler and a comprehensive reward model.
arXiv Detail & Related papers (2025-06-19T08:04:31Z)
Zero-Shot Whole-Body Humanoid Control via Behavioral Foundation Models [71.34520793462069]
Unsupervised reinforcement learning (RL) aims at pre-training agents that can solve a wide range of downstream tasks in complex environments. We introduce a novel algorithm regularizing unsupervised RL towards imitating trajectories from unlabeled behavior datasets. We demonstrate the effectiveness of this new approach in a challenging humanoid control problem.
arXiv Detail & Related papers (2025-04-15T10:41:11Z)
Towards Principled Unsupervised Multi-Agent Reinforcement Learning [49.533774397707056]
We present a scalable, decentralized, trust-region policy search algorithm to address the problem in practical settings.<n>We show that optimizing for a specific objective, namely mixture entropy, provides an excellent trade-off between tractability and performances.
arXiv Detail & Related papers (2025-02-12T12:51:36Z)
Effective Tuning Strategies for Generalist Robot Manipulation Policies [45.36380662552082]
Generalist robot manipulation policies (GMPs) have the potential to generalize across a wide range of tasks, devices, and environments. While fine-tuning offers a practical way to quickly adapt a GMPs to novel domains and tasks with limited samples, we observe that the performance of the resulting GMPs differs significantly with respect to the design choices of fine-tuning strategies.
arXiv Detail & Related papers (2024-10-02T04:00:25Z)
Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning [62.81324245896717]
We introduce an exploration-agnostic algorithm, called C-PG, which exhibits global last-ite convergence guarantees under (weak) gradient domination assumptions. We numerically validate our algorithms on constrained control problems, and compare them with state-of-the-art baselines.
arXiv Detail & Related papers (2024-07-15T14:54:57Z)
1st Place Solution for PSG competition with ECCV'22 SenseHuman Workshop [1.5362025549031049]
Panoptic Scene Graph (PSG) generation aims to generate scene graph representations based on panoptic segmentation instead of rigid bounding boxes. We propose GRNet, a Global Relation Network in two-stage paradigm, where the pre-extracted local object features and their corresponding masks are fed into a transformer with class embeddings. We conduct comprehensive experiments on OpenPSG dataset and achieve the state-of-art performance on the leadboard.
arXiv Detail & Related papers (2023-02-06T09:47:46Z)
Scalable PAC-Bayesian Meta-Learning via the PAC-Optimal Hyper-Posterior: From Theory to Practice [54.03076395748459]
A central question in the meta-learning literature is how to regularize to ensure generalization to unseen tasks. We present a generalization bound for meta-learning, which was first derived by Rothfuss et al. We provide a theoretical analysis and empirical case study under which conditions and to what extent these guarantees for meta-learning improve upon PAC-Bayesian per-task learning bounds.
arXiv Detail & Related papers (2022-11-14T08:51:04Z)
Universal Prompt Tuning for Graph Neural Networks [10.250964386142819]
We introduce a universal prompt-based tuning method called Graph Prompt Feature (GPF) for pre-trained GNN models under any pre-training strategy. GPF operates on the input graph's feature space and can theoretically achieve an equivalent effect to any form of prompting function. Our method significantly outperforms existing specialized prompt-based tuning methods when applied to models utilizing the pre-training strategy they specialize in.
arXiv Detail & Related papers (2022-09-30T05:19:27Z)
Controlling Conditional Language Models with Distributional Policy Gradients [2.9176992922046923]
General-purpose pretrained generative models often fail to meet some of the downstream requirements. This raises an important question on how to adapt pre-trained generative models to a new task without destroying its capabilities. Recent work has suggested to solve this problem by representing task-specific requirements through energy-based models. In this paper, we extend this approach to conditional tasks by proposing Conditional DPG (CDPG)
arXiv Detail & Related papers (2021-12-01T19:24:05Z)
Predicting Deep Neural Network Generalization with Perturbation Response Curves [58.8755389068888]
We propose a new framework for evaluating the generalization capabilities of trained networks. Specifically, we introduce two new measures for accurately predicting generalization gaps. We attain better predictive scores than the current state-of-the-art measures on a majority of tasks in the Predicting Generalization in Deep Learning (PGDL) NeurIPS 2020 competition.
arXiv Detail & Related papers (2021-06-09T01:37:36Z)
Semi-On-Policy Training for Sample Efficient Multi-Agent Policy Gradients [51.749831824106046]
We introduce semi-on-policy (SOP) training as an effective and computationally efficient way to address the sample inefficiency of on-policy policy gradient methods. We show that our methods perform as well or better than state-of-the-art value-based methods on a variety of SMAC tasks.
arXiv Detail & Related papers (2021-04-27T19:37:01Z)
Causal Policy Gradients [6.123324869194195]
Causal policy gradients (CPGs) provide a common framework for analysing key state-of-the-art algorithms. CPGs are shown to generalise traditional policy gradients, and yield a principled way of incorporating prior knowledge of a problem domain's generative processes.
arXiv Detail & Related papers (2021-02-20T14:51:12Z)
Reinforcement Learning-based Black-Box Evasion Attacks to Link Prediction in Dynamic Graphs [87.5882042724041]
Link prediction in dynamic graphs (LPDG) is an important research problem that has diverse applications. We study the vulnerability of LPDG methods and propose the first practical black-box evasion attack.
arXiv Detail & Related papers (2020-09-01T01:04:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.