Toward Discovering Options that Achieve Faster Planning
- URL: http://arxiv.org/abs/2205.12515v1
- Date: Wed, 25 May 2022 06:10:10 GMT
- Title: Toward Discovering Options that Achieve Faster Planning
- Authors: Yi Wan, Richard S. Sutton
- Abstract summary: We propose a new objective for option discovery that emphasizes the computational advantage of using options in planning.
Our new algorithm achieves a high objective value, which is close to the value achieved by a set of human-designed options.
- Score: 15.874687616157056
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a new objective for option discovery that emphasizes the
computational advantage of using options in planning. For a given set of
episodic tasks and a given number of options, the objective prefers options
that can be used to achieve a high return by composing few options. By
composing few options, fast planning can be achieved. When faced with new tasks
similar to the given ones, the discovered options are also expected to
accelerate planning. Our objective extends the objective proposed by Harb et
al. (2018) for the single-task setting to the multi-task setting. A closer look
at Harb et al.'s objective shows that the best options discovered given one
task are not likely to be useful for future unseen tasks and that the
multi-task setting is indeed necessary for this purpose. In the same paper,
Harb et al. also proposed an algorithm to optimize their objective, and the
algorithm can be naturally extended to the multi-task setting. We empirically
show that in the four-room domain the extension does not achieve a high
objective value and propose a new algorithm that better optimizes the proposed
objective. In the same four-room domain, we show that 1) a higher objective
value is typically associated with options with which fewer planning iterations
are needed to achieve near-optimal performance, 2) our new algorithm achieves a
high objective value, which is close to the value achieved by a set of
human-designed options, 3) the best number of planning iterations given the
discovered options is much smaller and matches it obtained given human-designed
options, and 4) the options produced by our algorithm also make intuitive sense
because they move to and terminate at cells near hallways connecting two
neighbor rooms.
Related papers
- Multi-Fidelity Bayesian Optimization With Across-Task Transferable Max-Value Entropy Search [36.14499894307206]
This paper introduces a novel information-theoretic acquisition function that balances the need to acquire information about the current task with the goal of collecting information transferable to future tasks.
Experimental results across synthetic and real-world examples reveal that the proposed provident acquisition strategy can significantly improve the optimization efficiency as soon as a sufficient number of tasks is processed.
arXiv Detail & Related papers (2024-03-14T17:00:01Z) - Experiment Planning with Function Approximation [49.50254688629728]
We study the problem of experiment planning with function approximation in contextual bandit problems.
We propose two experiment planning strategies compatible with function approximation.
We show that a uniform sampler achieves competitive optimality rates in the setting where the number of actions is small.
arXiv Detail & Related papers (2024-01-10T14:40:23Z) - Optimal Cost-Preference Trade-off Planning with Multiple Temporal Tasks [3.655021726150368]
We introduce a novel notion of preference that provides a generalized framework to express preferences over individual tasks as well as their relations.
We perform an optimal trade-off (Pareto) analysis between behaviors that adhere to the user's preference and the ones that are resource optimal.
arXiv Detail & Related papers (2023-06-22T21:56:49Z) - Adaptive Multi-Goal Exploration [118.40427257364729]
We show how AdaGoal can be used to tackle the objective of learning an $epsilon$-optimal goal-conditioned policy.
AdaGoal is anchored in the high-level algorithmic structure of existing methods for goal-conditioned deep reinforcement learning.
arXiv Detail & Related papers (2021-11-23T17:59:50Z) - Optimal To-Do List Gamification for Long Term Planning [0.6882042556551609]
We release an API that makes it easy to deploy our method in Web and app services.
We extend the previous version of our optimal gamification method with added services for helping people decide which tasks should and should not be done when there is not enough time to do everything.
We test the accuracy of the incentivised to-do list by comparing the performance of the strategy with the points computed exactly using Value Iteration for a variety of case studies.
To demonstrate its functionality, we released an API that makes it easy to deploy our method in Web and app services.
arXiv Detail & Related papers (2021-09-14T08:06:01Z) - Visual scoping operations for physical assembly [0.0]
We propose visual scoping, a strategy that interleaves planning and acting by alternately defining a spatial region as the next subgoal.
We find that visual scoping achieves comparable task performance to the subgoal planner while requiring only a fraction of the total computational cost.
arXiv Detail & Related papers (2021-06-10T10:50:35Z) - Exploring Relational Context for Multi-Task Dense Prediction [76.86090370115]
We consider a multi-task environment for dense prediction tasks, represented by a common backbone and independent task-specific heads.
We explore various attention-based contexts, such as global and local, in the multi-task setting.
We propose an Adaptive Task-Relational Context module, which samples the pool of all available contexts for each task pair.
arXiv Detail & Related papers (2021-04-28T16:45:56Z) - Exploration in two-stage recommender systems [79.50534282841618]
Two-stage recommender systems are widely adopted in industry due to their scalability and maintainability.
A key challenge of this setup is that optimal performance of each stage in isolation does not imply optimal global performance.
We propose a method of synchronising the exploration strategies between the ranker and the nominators.
arXiv Detail & Related papers (2020-09-01T16:52:51Z) - Long-Horizon Visual Planning with Goal-Conditioned Hierarchical
Predictors [124.30562402952319]
The ability to predict and plan into the future is fundamental for agents acting in the world.
Current learning approaches for visual prediction and planning fail on long-horizon tasks.
We propose a framework for visual prediction and planning that is able to overcome both of these limitations.
arXiv Detail & Related papers (2020-06-23T17:58:56Z) - Automatic Curriculum Learning through Value Disagreement [95.19299356298876]
Continually solving new, unsolved tasks is the key to learning diverse behaviors.
In the multi-task domain, where an agent needs to reach multiple goals, the choice of training goals can largely affect sample efficiency.
We propose setting up an automatic curriculum for goals that the agent needs to solve.
We evaluate our method across 13 multi-goal robotic tasks and 5 navigation tasks, and demonstrate performance gains over current state-of-the-art methods.
arXiv Detail & Related papers (2020-06-17T03:58:25Z) - Practical Bayesian Optimization of Objectives with Conditioning
Variables [1.0497128347190048]
We consider the more general case where a user is faced with multiple problems that each need to be optimized conditional on a state variable.
Similarity across objectives boosts optimization of each objective in two ways.
We propose a framework for conditional optimization: ConBO.
arXiv Detail & Related papers (2020-02-23T22:06:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.