TAB-Fields: A Maximum Entropy Framework for Mission-Aware Adversarial Planning
- URL: http://arxiv.org/abs/2412.02570v1
- Date: Tue, 03 Dec 2024 16:55:27 GMT
- Title: TAB-Fields: A Maximum Entropy Framework for Mission-Aware Adversarial Planning
- Authors: Gokul Puthumanaillam, Jae Hyuk Song, Nurzhan Yesmagambet, Shinkyu Park, Melkior Ornik,
- Abstract summary: We develop a representation that captures adversary state distributions over time by computing the most unbiased probability distribution consistent with known constraints.
We integrate TAB-Fields with standard planning algorithms by introducing TAB-conditioned POMCP, an adaptation of Partially Observable Monte Carlo Planning.
We demonstrate that our approach achieves superior performance compared to baselines that either assume specific adversary policies or neglect mission constraints altogether.
- Score: 2.4903631775244213
- License:
- Abstract: Autonomous agents operating in adversarial scenarios face a fundamental challenge: while they may know their adversaries' high-level objectives, such as reaching specific destinations within time constraints, the exact policies these adversaries will employ remain unknown. Traditional approaches address this challenge by treating the adversary's state as a partially observable element, leading to a formulation as a Partially Observable Markov Decision Process (POMDP). However, the induced belief-space dynamics in a POMDP require knowledge of the system's transition dynamics, which, in this case, depend on the adversary's unknown policy. Our key observation is that while an adversary's exact policy is unknown, their behavior is necessarily constrained by their mission objectives and the physical environment, allowing us to characterize the space of possible behaviors without assuming specific policies. In this paper, we develop Task-Aware Behavior Fields (TAB-Fields), a representation that captures adversary state distributions over time by computing the most unbiased probability distribution consistent with known constraints. We construct TAB-Fields by solving a constrained optimization problem that minimizes additional assumptions about adversary behavior beyond mission and environmental requirements. We integrate TAB-Fields with standard planning algorithms by introducing TAB-conditioned POMCP, an adaptation of Partially Observable Monte Carlo Planning. Through experiments in simulation with underwater robots and hardware implementations with ground robots, we demonstrate that our approach achieves superior performance compared to baselines that either assume specific adversary policies or neglect mission constraints altogether. Evaluation videos and code are available at https://tab-fields.github.io.
Related papers
- GUIDEd Agents: Enhancing Navigation Policies through Task-Specific Uncertainty Abstraction in Localization-Limited Environments [1.614803913005309]
We present a planning method for integrating task-specific uncertainty requirements directly into navigation policies.
We propose Generalized Uncertainty Integration for Decision-Making and Execution (GUIDE), a policy conditioning framework that incorporates these uncertainty requirements into robot decision-making.
We show how integrating GUIDE into reinforcement learning frameworks allows the agent to learn navigation policies that effectively balance task completion and uncertainty management without explicit reward engineering.
arXiv Detail & Related papers (2024-10-19T18:46:17Z) - Towards Interpretable Foundation Models of Robot Behavior: A Task Specific Policy Generation Approach [1.7205106391379026]
Foundation models are a promising path toward general-purpose and user-friendly robots.
In particular, the lack of modularity between tasks means that when model weights are updated, the behavior in other, unrelated tasks may be affected.
We present an alternative approach to the design of robot foundation models, which generates stand-alone, task-specific policies.
arXiv Detail & Related papers (2024-07-10T21:55:44Z) - Learning Logic Specifications for Policy Guidance in POMDPs: an
Inductive Logic Programming Approach [57.788675205519986]
We learn high-quality traces from POMDP executions generated by any solver.
We exploit data- and time-efficient Indu Logic Programming (ILP) to generate interpretable belief-based policy specifications.
We show that learneds expressed in Answer Set Programming (ASP) yield performance superior to neural networks and similar to optimal handcrafted task-specifics within lower computational time.
arXiv Detail & Related papers (2024-02-29T15:36:01Z) - Decision Making in Non-Stationary Environments with Policy-Augmented
Search [9.000981144624507]
We introduce textitPolicy-Augmented Monte Carlo tree search (PA-MCTS)
It combines action-value estimates from an out-of-date policy with an online search using an up-to-date model of the environment.
We prove theoretical results showing conditions under which PA-MCTS selects the one-step optimal action and also bound the error accrued while following PA-MCTS as a policy.
arXiv Detail & Related papers (2024-01-06T11:51:50Z) - Learning Logic Specifications for Soft Policy Guidance in POMCP [71.69251176275638]
Partially Observable Monte Carlo Planning (POMCP) is an efficient solver for Partially Observable Markov Decision Processes (POMDPs)
POMCP suffers from sparse reward function, namely, rewards achieved only when the final goal is reached.
In this paper, we use inductive logic programming to learn logic specifications from traces of POMCP executions.
arXiv Detail & Related papers (2023-03-16T09:37:10Z) - Learning Robust Policy against Disturbance in Transition Dynamics via
State-Conservative Policy Optimization [63.75188254377202]
Deep reinforcement learning algorithms can perform poorly in real-world tasks due to discrepancy between source and target environments.
We propose a novel model-free actor-critic algorithm to learn robust policies without modeling the disturbance in advance.
Experiments in several robot control tasks demonstrate that SCPO learns robust policies against the disturbance in transition dynamics.
arXiv Detail & Related papers (2021-12-20T13:13:05Z) - Rule-based Shielding for Partially Observable Monte-Carlo Planning [78.05638156687343]
We propose two contributions to Partially Observable Monte-Carlo Planning (POMCP)
The first is a method for identifying unexpected actions selected by POMCP with respect to expert prior knowledge of the task.
The second is a shielding approach that prevents POMCP from selecting unexpected actions.
We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to velocity regulation in mobile robot navigation.
arXiv Detail & Related papers (2021-04-28T14:23:38Z) - Modular Deep Reinforcement Learning for Continuous Motion Planning with
Temporal Logic [59.94347858883343]
This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP)
The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP.
The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states.
arXiv Detail & Related papers (2021-02-24T01:11:25Z) - Privacy-Constrained Policies via Mutual Information Regularized Policy Gradients [54.98496284653234]
We consider the task of training a policy that maximizes reward while minimizing disclosure of certain sensitive state variables through the actions.
We solve this problem by introducing a regularizer based on the mutual information between the sensitive state and the actions.
We develop a model-based estimator for optimization of privacy-constrained policies.
arXiv Detail & Related papers (2020-12-30T03:22:35Z) - Verifiable Planning in Expected Reward Multichain MDPs [20.456052208569115]
We explore the steady-state planning problem of deriving a decision-making policy for an agent.
We prove that optimal solutions to the proposed programs yield stationary policies with rigorous guarantees of behavior.
arXiv Detail & Related papers (2020-12-03T18:54:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.