Value of Information-based Deceptive Path Planning Under Adversarial Interventions
- URL: http://arxiv.org/abs/2503.24284v1
- Date: Mon, 31 Mar 2025 16:31:29 GMT
- Title: Value of Information-based Deceptive Path Planning Under Adversarial Interventions
- Authors: Wesley A. Suttle, Jesse Milzman, Mustafa O. Karabag, Brian M. Sadler, Ufuk Topcu,
- Abstract summary: We propose a novel Markov decision process (MDP)-based model for the deceptive path planning problem under adversarial interventions.<n>Using the VoI objectives we propose, path planning agents deceive the adversarial observer into choosing suboptimal interventions.
- Score: 26.543790095871433
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing methods for deceptive path planning (DPP) address the problem of designing paths that conceal their true goal from a passive, external observer. Such methods do not apply to problems where the observer has the ability to perform adversarial interventions to impede the path planning agent. In this paper, we propose a novel Markov decision process (MDP)-based model for the DPP problem under adversarial interventions and develop new value of information (VoI) objectives to guide the design of DPP policies. Using the VoI objectives we propose, path planning agents deceive the adversarial observer into choosing suboptimal interventions by selecting trajectories that are of low informational value to the observer. Leveraging connections to the linear programming theory for MDPs, we derive computationally efficient solution methods for synthesizing policies for performing DPP under adversarial interventions. In our experiments, we illustrate the effectiveness of the proposed solution method in achieving deceptiveness under adversarial interventions and demonstrate the superior performance of our approach to both existing DPP methods and conservative path planning approaches on illustrative gridworld problems.
Related papers
- Interaction-Grounded Learning for Contextual Markov Decision Processes with Personalized Feedback [59.287761696290865]
We propose a computationally efficient algorithm that achieves a sublinear regret guarantee for contextual episodic Markov Decision Processes (MDPs) with personalized feedback.<n>We demonstrate the effectiveness of our method in learning personalized objectives from multi-turn interactions through experiments on both a synthetic episodic MDP and a real-world user booking dataset.
arXiv Detail & Related papers (2026-02-09T06:29:54Z) - Towards a Goal-Centric Assessment of Requirements Engineering Methods for Privacy by Design [13.815715903288622]
Implementing privacy by design according to General Regulation (PbD) report is met with growing number of Data Protection engineering (RE) approaches.<n>We suggest goal-centric approach for PbD methods assessment.
arXiv Detail & Related papers (2026-01-22T16:22:23Z) - Latent Diffusion Planning for Imitation Learning [78.56207566743154]
Latent Diffusion Planning (LDP) is a modular approach consisting of a planner and inverse dynamics model.
By separating planning from action prediction, LDP can benefit from the denser supervision signals of suboptimal and action-free data.
On simulated visual robotic manipulation tasks, LDP outperforms state-of-the-art imitation learning approaches.
arXiv Detail & Related papers (2025-04-23T17:53:34Z) - Planning as In-Painting: A Diffusion-Based Embodied Task Planning
Framework for Environments under Uncertainty [56.30846158280031]
Task planning for embodied AI has been one of the most challenging problems.
We propose a task-agnostic method named 'planning as in-painting'
The proposed framework achieves promising performances in various embodied AI tasks.
arXiv Detail & Related papers (2023-12-02T10:07:17Z) - PDPP: Projected Diffusion for Procedure Planning in Instructional Videos [18.984980596601513]
We study the problem of procedure planning in instructional videos, which aims to make a plan (i.e. a sequence of actions) given the current visual observation and the desired goal.<n>Previous works cast this as a sequence modeling problem and leverage either intermediate visual observations or language instructions as supervision.<n>To avoid intermediate supervision annotation and error accumulation caused by planning autoregressively, we propose a diffusion-based framework.
arXiv Detail & Related papers (2023-03-26T10:50:16Z) - An Auction-based Coordination Strategy for Task-Constrained Multi-Agent
Stochastic Planning with Submodular Rewards [7.419725234099728]
Existing task coordination algorithms either ignore the process or suffer from the computational intensity.
We propose a decentralized auction-based coordination strategy using a newly formulated score function.
For the implementation on large-scale applications, an approximate variant of the proposed method, namely Deep Auction, is also suggested.
arXiv Detail & Related papers (2022-12-30T10:25:25Z) - Reinforcement Learning with a Terminator [80.34572413850186]
We learn the parameters of the TerMDP and leverage the structure of the estimation problem to provide state-wise confidence bounds.
We use these to construct a provably-efficient algorithm, which accounts for termination, and bound its regret.
arXiv Detail & Related papers (2022-05-30T18:40:28Z) - Sequential Information Design: Markov Persuasion Process and Its
Efficient Reinforcement Learning [156.5667417159582]
This paper proposes a novel model of sequential information design, namely the Markov persuasion processes (MPPs)
Planning in MPPs faces the unique challenge in finding a signaling policy that is simultaneously persuasive to the myopic receivers and inducing the optimal long-term cumulative utilities of the sender.
We design a provably efficient no-regret learning algorithm, the Optimism-Pessimism Principle for Persuasion Process (OP4), which features a novel combination of both optimism and pessimism principles.
arXiv Detail & Related papers (2022-02-22T05:41:43Z) - Proximal Reinforcement Learning: Efficient Off-Policy Evaluation in
Partially Observed Markov Decision Processes [65.91730154730905]
In applications of offline reinforcement learning to observational data, such as in healthcare or education, a general concern is that observed actions might be affected by unobserved factors.
Here we tackle this by considering off-policy evaluation in a partially observed Markov decision process (POMDP)
We extend the framework of proximal causal inference to our POMDP setting, providing a variety of settings where identification is made possible.
arXiv Detail & Related papers (2021-10-28T17:46:14Z) - Deceptive Decision-Making Under Uncertainty [25.197098169762356]
We study the design of autonomous agents that are capable of deceiving outside observers about their intentions while carrying out tasks.
By modeling the agent's behavior as a Markov decision process, we consider a setting where the agent aims to reach one of multiple potential goals.
We propose a novel approach to model observer predictions based on the principle of maximum entropy and to efficiently generate deceptive strategies.
arXiv Detail & Related papers (2021-09-14T14:56:23Z) - E-PDDL: A Standardized Way of Defining Epistemic Planning Problems [11.381221864778976]
Epistemic Planning (EP) refers to an automated planning setting where the agent reasons in the space of knowledge states.
We propose a unified way of specifying EP problems - the Epistemic Planning Domain Language, EPDDL.
We show that E-PDDL can be supported by leading MEP planners and provide corresponding code that translates MEP problems into (MEP) problems that can be handled by several planners.
arXiv Detail & Related papers (2021-07-19T10:20:20Z) - Variance-Aware Off-Policy Evaluation with Linear Function Approximation [85.75516599931632]
We study the off-policy evaluation problem in reinforcement learning with linear function approximation.
We propose an algorithm, VA-OPE, which uses the estimated variance of the value function to reweight the Bellman residual in Fitted Q-Iteration.
arXiv Detail & Related papers (2021-06-22T17:58:46Z) - An Offline Risk-aware Policy Selection Method for Bayesian Markov
Decision Processes [0.0]
Exploitation vs Caution (EvC) is a paradigm that elegantly incorporates model uncertainty abiding by the Bayesian formalism.
We validate EvC with state-of-the-art approaches in different discrete, yet simple, environments offering a fair variety of MDP classes.
In the tested scenarios EvC manages to select robust policies and hence stands out as a useful tool for practitioners.
arXiv Detail & Related papers (2021-05-27T20:12:20Z) - Adaptive Informative Path Planning with Multimodal Sensing [36.16721115973077]
AIPPMS (MS for Multimodal Sensing)
We frame AIPPMS as a Partially Observable Markov Decision Process (POMDP) and solve it with online planning.
We evaluate our method on two domains: a simulated search-and-rescue scenario and a challenging extension to the classic RockSample problem.
arXiv Detail & Related papers (2020-03-21T20:28:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.