MAGIC: Learning Macro-Actions for Online POMDP Planning
- URL: http://arxiv.org/abs/2011.03813v4
- Date: Thu, 1 Jul 2021 06:04:09 GMT
- Title: MAGIC: Learning Macro-Actions for Online POMDP Planning
- Authors: Yiyuan Lee, Panpan Cai, David Hsu
- Abstract summary: MAGIC learns a macro-action generator end-to-end, using an online planner's performance as the feedback.
We evaluate MAGIC on several long-horizon planning tasks both in simulation and on a real robot.
- Score: 14.156697390568617
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The partially observable Markov decision process (POMDP) is a principled
general framework for robot decision making under uncertainty, but POMDP
planning suffers from high computational complexity, when long-term planning is
required. While temporally-extended macro-actions help to cut down the
effective planning horizon and significantly improve computational efficiency,
how do we acquire good macro-actions? This paper proposes Macro-Action
Generator-Critic (MAGIC), which performs offline learning of macro-actions
optimized for online POMDP planning. Specifically, MAGIC learns a macro-action
generator end-to-end, using an online planner's performance as the feedback.
During online planning, the generator generates on the fly situation-aware
macro-actions conditioned on the robot's belief and the environment context. We
evaluated MAGIC on several long-horizon planning tasks both in simulation and
on a real robot. The experimental results show that the learned macro-actions
offer significant benefits in online planning performance, compared with
primitive actions and handcrafted macro-actions.
Related papers
- Macro-from-Micro Planning for High-Quality and Parallelized Autoregressive Long Video Generation [50.42977813298953]
Current autoregressive diffusion models excel at video generation but are generally limited to short temporal durations.<n>We propose a planning-then-populating framework centered on Macro-from-Micro Planning (MMPL) for long video generation.<n>MMPL sketches a global storyline for the entire video through two hierarchical stages: Micro Planning and Macro Planning.
arXiv Detail & Related papers (2025-08-05T11:21:54Z) - Learning Symbolic Persistent Macro-Actions for POMDP Solving Over Time [52.03682298194168]
This paper proposes an integration of temporal logical reasoning and Partially Observable Markov Decision Processes (POMDPs)<n>Our method leverages a fragment of Linear Temporal Logic (LTL) based on Event Calculus (EC) to generate emphpersistent (i.e., constant) macro-actions.<n>These macro-actions guide Monte Carlo Tree Search (MCTS)-based POMDP solvers over a time horizon.
arXiv Detail & Related papers (2025-05-06T16:08:55Z) - REMAC: Self-Reflective and Self-Evolving Multi-Agent Collaboration for Long-Horizon Robot Manipulation [57.628771707989166]
We propose an adaptive multi-agent planning framework, termed REMAC, that enables efficient, scene-agnostic multi-robot long-horizon task planning and execution.
ReMAC incorporates two key modules: a self-reflection module performing pre-conditions and post-condition checks in the loop to evaluate progress and refine plans, and a self-evolvement module dynamically adapting plans based on scene-specific reasoning.
arXiv Detail & Related papers (2025-03-28T03:51:40Z) - Scalable Decision-Making in Stochastic Environments through Learned Temporal Abstraction [7.918703013303246]
We present Latent Macro Action Planner (L-MAP), which addresses the challenge of learning to make decisions in high-dimensional continuous action spaces.
L-MAP learns a set of temporally extended macro-actions through a state-conditional Vector Quantized Variational Autoencoder (VQ-VAE)
In offline RL settings, including continuous control tasks, L-MAP efficiently searches over discrete latent actions to yield high expected returns.
arXiv Detail & Related papers (2025-02-28T16:02:23Z) - DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution [114.61347672265076]
Development of MLLMs for real-world robots is challenging due to the typically limited computation and memory capacities available on robotic platforms.
We propose a Dynamic Early-Exit Framework for Robotic Vision-Language-Action Model (DeeR) that automatically adjusts the size of the activated MLLM.
DeeR demonstrates significant reductions in computational costs of LLM by 5.2-6.5x and GPU memory of LLM by 2-6x without compromising performance.
arXiv Detail & Related papers (2024-11-04T18:26:08Z) - Learning Logic Specifications for Policy Guidance in POMDPs: an
Inductive Logic Programming Approach [57.788675205519986]
We learn high-quality traces from POMDP executions generated by any solver.
We exploit data- and time-efficient Indu Logic Programming (ILP) to generate interpretable belief-based policy specifications.
We show that learneds expressed in Answer Set Programming (ASP) yield performance superior to neural networks and similar to optimal handcrafted task-specifics within lower computational time.
arXiv Detail & Related papers (2024-02-29T15:36:01Z) - Large Language Models are Learnable Planners for Long-Term Recommendation [59.167795967630305]
Planning for both immediate and long-term benefits becomes increasingly important in recommendation.
Existing methods apply Reinforcement Learning to learn planning capacity by maximizing cumulative reward for long-term recommendation.
We propose to leverage the remarkable planning capabilities over sparse data of Large Language Models for long-term recommendation.
arXiv Detail & Related papers (2024-02-29T13:49:56Z) - EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning [84.6451394629312]
We introduce EgoPlan-Bench, a benchmark to evaluate the planning abilities of MLLMs in real-world scenarios.
We show that EgoPlan-Bench poses significant challenges, highlighting a substantial scope for improvement in MLLMs to achieve human-level task planning.
We also present EgoPlan-IT, a specialized instruction-tuning dataset that effectively enhances model performance on EgoPlan-Bench.
arXiv Detail & Related papers (2023-12-11T03:35:58Z) - AdaPlanner: Adaptive Planning from Feedback with Language Models [56.367020818139665]
Large language models (LLMs) have recently demonstrated the potential in acting as autonomous agents for sequential decision-making tasks.
We propose a closed-loop approach, AdaPlanner, which allows the LLM agent to refine its self-generated plan adaptively in response to environmental feedback.
To mitigate hallucination, we develop a code-style LLM prompt structure that facilitates plan generation across a variety of tasks, environments, and agent capabilities.
arXiv Detail & Related papers (2023-05-26T05:52:27Z) - Sequential Information Design: Markov Persuasion Process and Its
Efficient Reinforcement Learning [156.5667417159582]
This paper proposes a novel model of sequential information design, namely the Markov persuasion processes (MPPs)
Planning in MPPs faces the unique challenge in finding a signaling policy that is simultaneously persuasive to the myopic receivers and inducing the optimal long-term cumulative utilities of the sender.
We design a provably efficient no-regret learning algorithm, the Optimism-Pessimism Principle for Persuasion Process (OP4), which features a novel combination of both optimism and pessimism principles.
arXiv Detail & Related papers (2022-02-22T05:41:43Z) - The Value of Planning for Infinite-Horizon Model Predictive Control [0.0]
We show how the intermediate data structures used by modern planners can be interpreted as an approximate value function.
We show that this value function can be used by MPC directly, resulting in more efficient and resilient behavior at runtime.
arXiv Detail & Related papers (2021-04-07T02:21:55Z) - Knowledge-Based Hierarchical POMDPs for Task Planning [0.34998703934432684]
The main goal in task planning is to build a sequence of actions that takes an agent from an initial state to a goal state.
In robotics, this is particularly difficult because actions usually have several possible results, and sensors are prone to produce measurements with error.
We present a scheme to encode knowledge about the robot and its environment, that promotes the modularity and reuse of information.
arXiv Detail & Related papers (2021-03-19T05:45:05Z) - Efficient Planning in Large MDPs with Weak Linear Function Approximation [4.56877715768796]
Large-scale decision processes (MDPs) require planning algorithms independent of the number of states of the MDP.
We consider the planning problem in MDPs using linear value function approximation with only weak requirements.
arXiv Detail & Related papers (2020-07-13T04:40:41Z) - Efficient Black-Box Planning Using Macro-Actions with Focused Effects [35.688161278362735]
Heuristics can make search more efficient, but goal-awares for black-box planning.
We show how to overcome this limitation by discovering macro-actions that make the goal-count more accurate.
arXiv Detail & Related papers (2020-04-28T02:13:12Z) - Macro-Action-Based Deep Multi-Agent Reinforcement Learning [17.73081797556005]
This paper proposes two Deep Q-Network (DQN) based methods for learning decentralized and centralized macro-action-value functions.
Evaluations on benchmark problems and a larger domain demonstrate the advantage of learning with macro-actions over primitive-actions.
arXiv Detail & Related papers (2020-04-18T15:46:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.