Related papers: Toward Discovering Options that Achieve Faster Planning

Toward Discovering Options that Achieve Faster Planning

URL: http://arxiv.org/abs/2205.12515v1
Date: Wed, 25 May 2022 06:10:10 GMT
Title: Toward Discovering Options that Achieve Faster Planning
Authors: Yi Wan, Richard S. Sutton
Abstract summary: We propose a new objective for option discovery that emphasizes the computational advantage of using options in planning. Our new algorithm achieves a high objective value, which is close to the value achieved by a set of human-designed options.
Score: 15.874687616157056
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a new objective for option discovery that emphasizes the computational advantage of using options in planning. For a given set of episodic tasks and a given number of options, the objective prefers options that can be used to achieve a high return by composing few options. By composing few options, fast planning can be achieved. When faced with new tasks similar to the given ones, the discovered options are also expected to accelerate planning. Our objective extends the objective proposed by Harb et al. (2018) for the single-task setting to the multi-task setting. A closer look at Harb et al.'s objective shows that the best options discovered given one task are not likely to be useful for future unseen tasks and that the multi-task setting is indeed necessary for this purpose. In the same paper, Harb et al. also proposed an algorithm to optimize their objective, and the algorithm can be naturally extended to the multi-task setting. We empirically show that in the four-room domain the extension does not achieve a high objective value and propose a new algorithm that better optimizes the proposed objective. In the same four-room domain, we show that 1) a higher objective value is typically associated with options with which fewer planning iterations are needed to achieve near-optimal performance, 2) our new algorithm achieves a high objective value, which is close to the value achieved by a set of human-designed options, 3) the best number of planning iterations given the discovered options is much smaller and matches it obtained given human-designed options, and 4) the options produced by our algorithm also make intuitive sense because they move to and terminate at cells near hallways connecting two neighbor rooms.

Related papers

World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning [60.100794160682646]
We propose a new learning framework that jointly optimize state prediction and action selection through preference learning. To automatically collect trajectories and stepwise preference data without human annotation, we introduce a tree search mechanism for extensive exploration via trial-and-error. Our method significantly outperforms existing methods and GPT-4o when applied to Qwen2-VL (7B), LLaVA-1.6 (7B), and LLaMA-3.2 (11B)
arXiv Detail & Related papers (2025-03-13T15:49:56Z)
Experiment Planning with Function Approximation [49.50254688629728]
We study the problem of experiment planning with function approximation in contextual bandit problems. We propose two experiment planning strategies compatible with function approximation. We show that a uniform sampler achieves competitive optimality rates in the setting where the number of actions is small.
arXiv Detail & Related papers (2024-01-10T14:40:23Z)
Optimal Cost-Preference Trade-off Planning with Multiple Temporal Tasks [3.655021726150368]
We introduce a novel notion of preference that provides a generalized framework to express preferences over individual tasks as well as their relations. We perform an optimal trade-off (Pareto) analysis between behaviors that adhere to the user's preference and the ones that are resource optimal.
arXiv Detail & Related papers (2023-06-22T21:56:49Z)
Adaptive Multi-Goal Exploration [118.40427257364729]
We show how AdaGoal can be used to tackle the objective of learning an $epsilon$-optimal goal-conditioned policy. AdaGoal is anchored in the high-level algorithmic structure of existing methods for goal-conditioned deep reinforcement learning.
arXiv Detail & Related papers (2021-11-23T17:59:50Z)
Optimal To-Do List Gamification for Long Term Planning [0.6882042556551609]
We release an API that makes it easy to deploy our method in Web and app services. We extend the previous version of our optimal gamification method with added services for helping people decide which tasks should and should not be done when there is not enough time to do everything. We test the accuracy of the incentivised to-do list by comparing the performance of the strategy with the points computed exactly using Value Iteration for a variety of case studies. To demonstrate its functionality, we released an API that makes it easy to deploy our method in Web and app services.
arXiv Detail & Related papers (2021-09-14T08:06:01Z)
Visual scoping operations for physical assembly [0.0]
We propose visual scoping, a strategy that interleaves planning and acting by alternately defining a spatial region as the next subgoal. We find that visual scoping achieves comparable task performance to the subgoal planner while requiring only a fraction of the total computational cost.
arXiv Detail & Related papers (2021-06-10T10:50:35Z)
Exploring Relational Context for Multi-Task Dense Prediction [76.86090370115]
We consider a multi-task environment for dense prediction tasks, represented by a common backbone and independent task-specific heads. We explore various attention-based contexts, such as global and local, in the multi-task setting. We propose an Adaptive Task-Relational Context module, which samples the pool of all available contexts for each task pair.
arXiv Detail & Related papers (2021-04-28T16:45:56Z)
Exploration in two-stage recommender systems [79.50534282841618]
Two-stage recommender systems are widely adopted in industry due to their scalability and maintainability. A key challenge of this setup is that optimal performance of each stage in isolation does not imply optimal global performance. We propose a method of synchronising the exploration strategies between the ranker and the nominators.
arXiv Detail & Related papers (2020-09-01T16:52:51Z)
Long-Horizon Visual Planning with Goal-Conditioned Hierarchical Predictors [124.30562402952319]
The ability to predict and plan into the future is fundamental for agents acting in the world. Current learning approaches for visual prediction and planning fail on long-horizon tasks. We propose a framework for visual prediction and planning that is able to overcome both of these limitations.
arXiv Detail & Related papers (2020-06-23T17:58:56Z)
Automatic Curriculum Learning through Value Disagreement [95.19299356298876]
Continually solving new, unsolved tasks is the key to learning diverse behaviors. In the multi-task domain, where an agent needs to reach multiple goals, the choice of training goals can largely affect sample efficiency. We propose setting up an automatic curriculum for goals that the agent needs to solve. We evaluate our method across 13 multi-goal robotic tasks and 5 navigation tasks, and demonstrate performance gains over current state-of-the-art methods.
arXiv Detail & Related papers (2020-06-17T03:58:25Z)
Adjust Planning Strategies to Accommodate Reinforcement Learning Agents [0.0]
We create an optimization strategy for planning parameters, through analysis to the connection of reaction and planning. The whole algorithm can find a satisfactory setting of planning parameters, making full use of reaction capability of specific agents.
arXiv Detail & Related papers (2020-03-19T03:35:10Z)
Practical Bayesian Optimization of Objectives with Conditioning Variables [1.0497128347190048]
We consider the more general case where a user is faced with multiple problems that each need to be optimized conditional on a state variable. Similarity across objectives boosts optimization of each objective in two ways. We propose a framework for conditional optimization: ConBO.
arXiv Detail & Related papers (2020-02-23T22:06:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.