Grasper: A Generalist Pursuer for Pursuit-Evasion Problems
- URL: http://arxiv.org/abs/2404.12626v1
- Date: Fri, 19 Apr 2024 04:54:38 GMT
- Title: Grasper: A Generalist Pursuer for Pursuit-Evasion Problems
- Authors: Pengdeng Li, Shuxin Li, Xinrun Wang, Jakub Cerny, Youzhi Zhang, Stephen McAleer, Hau Chan, Bo An,
- Abstract summary: Pursuit-evasion games (PEGs) model interactions between a team of pursuers and an evader in graph-based environments.
Recent advancements have demonstrated the effectiveness of the pre-training and fine-tuning paradigm in PSRO.
We introduce Grasper, a GeneRAlist purSuer for Pursuit-Evasion pRoblems, capable of efficiently generating pursuer policies tailored to specific PEGs.
- Score: 36.115954360950134
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pursuit-evasion games (PEGs) model interactions between a team of pursuers and an evader in graph-based environments such as urban street networks. Recent advancements have demonstrated the effectiveness of the pre-training and fine-tuning paradigm in PSRO to improve scalability in solving large-scale PEGs. However, these methods primarily focus on specific PEGs with fixed initial conditions that may vary substantially in real-world scenarios, which significantly hinders the applicability of the traditional methods. To address this issue, we introduce Grasper, a GeneRAlist purSuer for Pursuit-Evasion pRoblems, capable of efficiently generating pursuer policies tailored to specific PEGs. Our contributions are threefold: First, we present a novel architecture that offers high-quality solutions for diverse PEGs, comprising critical components such as (i) a graph neural network (GNN) to encode PEGs into hidden vectors, and (ii) a hypernetwork to generate pursuer policies based on these hidden vectors. As a second contribution, we develop an efficient three-stage training method involving (i) a pre-pretraining stage for learning robust PEG representations through self-supervised graph learning techniques like GraphMAE, (ii) a pre-training stage utilizing heuristic-guided multi-task pre-training (HMP) where heuristic-derived reference policies (e.g., through Dijkstra's algorithm) regularize pursuer policies, and (iii) a fine-tuning stage that employs PSRO to generate pursuer policies on designated PEGs. Finally, we perform extensive experiments on synthetic and real-world maps, showcasing Grasper's significant superiority over baselines in terms of solution quality and generalizability. We demonstrate that Grasper provides a versatile approach for solving pursuit-evasion problems across a broad range of scenarios, enabling practical deployment in real-world situations.
Related papers
- R2PS: Worst-Case Robust Real-Time Pursuit Strategies under Partial Observability [25.176860778665173]
This paper introduces the first approach to worst-case robust real-time pursuit strategies (R2PS) under partial observability.<n>We first prove that a traditional dynamic programming (DP) algorithm for solving Markov PEGs maintains optimality under the asynchronous moves by the evader.<n>We then propose a belief preservation mechanism about the evader's possible positions, extending the DP pursuit strategies to a partially observable setting.
arXiv Detail & Related papers (2025-11-21T16:34:00Z) - Equilibrium Policy Generalization: A Reinforcement Learning Framework for Cross-Graph Zero-Shot Generalization in Pursuit-Evasion Games [38.70408341845241]
Pursuit-evasion game (PEG) is an important class of real-world games from the fields of robotics and security.<n>This paper proposes an Equilibrium Policy Generalization framework to learn a generalized policy with robust cross-graph zero-shot performance.<n> Experimental results show that, using equilibrium guidance and a distance feature proposed for cross-graph PEG training, the EPG framework guarantees desirable zero-shot performance.
arXiv Detail & Related papers (2025-11-02T05:45:27Z) - Polychromic Objectives for Reinforcement Learning [63.37185057794815]
Reinforcement learning fine-tuning (RLFT) is a dominant paradigm for improving pretrained policies for downstream tasks.<n>We introduce an objective for policy methods that explicitly enforces the exploration and refinement of diverse generations.<n>We show how proximal policy optimization (PPO) can be adapted to optimize this objective.
arXiv Detail & Related papers (2025-09-29T19:32:11Z) - HGMP:Heterogeneous Graph Multi-Task Prompt Learning [18.703129208282913]
We propose a novel multi-task prompt framework for the heterogeneous graph domain, named HGMP.<n>First, to bridge the gap between the pre-trained model and downstream tasks, we reformulate all downstream tasks into a unified graph-level task format.<n>We design a graph-level contrastive pre-training strategy to better leverage heterogeneous information and enhance performance in multi-task scenarios.
arXiv Detail & Related papers (2025-07-10T04:01:47Z) - GFlowGR: Fine-tuning Generative Recommendation Frameworks with Generative Flow Networks [36.39484385717512]
This paper treats the Generative Recommendations (GR) as a multi-step generation task and constructs a GFlowNets-based fine-tuning framework (GFlowGR)<n>The proposed framework integrates collaborative knowledge from traditional recommender systems to create an adaptive trajectory sampler and a comprehensive reward model.
arXiv Detail & Related papers (2025-06-19T08:04:31Z) - Zero-Shot Whole-Body Humanoid Control via Behavioral Foundation Models [71.34520793462069]
Unsupervised reinforcement learning (RL) aims at pre-training agents that can solve a wide range of downstream tasks in complex environments.
We introduce a novel algorithm regularizing unsupervised RL towards imitating trajectories from unlabeled behavior datasets.
We demonstrate the effectiveness of this new approach in a challenging humanoid control problem.
arXiv Detail & Related papers (2025-04-15T10:41:11Z) - Towards Principled Unsupervised Multi-Agent Reinforcement Learning [49.533774397707056]
We present a scalable, decentralized, trust-region policy search algorithm to address the problem in practical settings.<n>We show that optimizing for a specific objective, namely mixture entropy, provides an excellent trade-off between tractability and performances.
arXiv Detail & Related papers (2025-02-12T12:51:36Z) - Effective Tuning Strategies for Generalist Robot Manipulation Policies [45.36380662552082]
Generalist robot manipulation policies (GMPs) have the potential to generalize across a wide range of tasks, devices, and environments.
While fine-tuning offers a practical way to quickly adapt a GMPs to novel domains and tasks with limited samples, we observe that the performance of the resulting GMPs differs significantly with respect to the design choices of fine-tuning strategies.
arXiv Detail & Related papers (2024-10-02T04:00:25Z) - Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning [62.81324245896717]
We introduce an exploration-agnostic algorithm, called C-PG, which exhibits global last-ite convergence guarantees under (weak) gradient domination assumptions.
We numerically validate our algorithms on constrained control problems, and compare them with state-of-the-art baselines.
arXiv Detail & Related papers (2024-07-15T14:54:57Z) - 1st Place Solution for PSG competition with ECCV'22 SenseHuman Workshop [1.5362025549031049]
Panoptic Scene Graph (PSG) generation aims to generate scene graph representations based on panoptic segmentation instead of rigid bounding boxes.
We propose GRNet, a Global Relation Network in two-stage paradigm, where the pre-extracted local object features and their corresponding masks are fed into a transformer with class embeddings.
We conduct comprehensive experiments on OpenPSG dataset and achieve the state-of-art performance on the leadboard.
arXiv Detail & Related papers (2023-02-06T09:47:46Z) - Scalable PAC-Bayesian Meta-Learning via the PAC-Optimal Hyper-Posterior:
From Theory to Practice [54.03076395748459]
A central question in the meta-learning literature is how to regularize to ensure generalization to unseen tasks.
We present a generalization bound for meta-learning, which was first derived by Rothfuss et al.
We provide a theoretical analysis and empirical case study under which conditions and to what extent these guarantees for meta-learning improve upon PAC-Bayesian per-task learning bounds.
arXiv Detail & Related papers (2022-11-14T08:51:04Z) - Universal Prompt Tuning for Graph Neural Networks [10.250964386142819]
We introduce a universal prompt-based tuning method called Graph Prompt Feature (GPF) for pre-trained GNN models under any pre-training strategy.
GPF operates on the input graph's feature space and can theoretically achieve an equivalent effect to any form of prompting function.
Our method significantly outperforms existing specialized prompt-based tuning methods when applied to models utilizing the pre-training strategy they specialize in.
arXiv Detail & Related papers (2022-09-30T05:19:27Z) - Controlling Conditional Language Models with Distributional Policy
Gradients [2.9176992922046923]
General-purpose pretrained generative models often fail to meet some of the downstream requirements.
This raises an important question on how to adapt pre-trained generative models to a new task without destroying its capabilities.
Recent work has suggested to solve this problem by representing task-specific requirements through energy-based models.
In this paper, we extend this approach to conditional tasks by proposing Conditional DPG (CDPG)
arXiv Detail & Related papers (2021-12-01T19:24:05Z) - Predicting Deep Neural Network Generalization with Perturbation Response
Curves [58.8755389068888]
We propose a new framework for evaluating the generalization capabilities of trained networks.
Specifically, we introduce two new measures for accurately predicting generalization gaps.
We attain better predictive scores than the current state-of-the-art measures on a majority of tasks in the Predicting Generalization in Deep Learning (PGDL) NeurIPS 2020 competition.
arXiv Detail & Related papers (2021-06-09T01:37:36Z) - Semi-On-Policy Training for Sample Efficient Multi-Agent Policy
Gradients [51.749831824106046]
We introduce semi-on-policy (SOP) training as an effective and computationally efficient way to address the sample inefficiency of on-policy policy gradient methods.
We show that our methods perform as well or better than state-of-the-art value-based methods on a variety of SMAC tasks.
arXiv Detail & Related papers (2021-04-27T19:37:01Z) - Causal Policy Gradients [6.123324869194195]
Causal policy gradients (CPGs) provide a common framework for analysing key state-of-the-art algorithms.
CPGs are shown to generalise traditional policy gradients, and yield a principled way of incorporating prior knowledge of a problem domain's generative processes.
arXiv Detail & Related papers (2021-02-20T14:51:12Z) - Reinforcement Learning-based Black-Box Evasion Attacks to Link
Prediction in Dynamic Graphs [87.5882042724041]
Link prediction in dynamic graphs (LPDG) is an important research problem that has diverse applications.
We study the vulnerability of LPDG methods and propose the first practical black-box evasion attack.
arXiv Detail & Related papers (2020-09-01T01:04:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.