Solving Hard AI Planning Instances Using Curriculum-Driven Deep
Reinforcement Learning
- URL: http://arxiv.org/abs/2006.02689v1
- Date: Thu, 4 Jun 2020 08:13:12 GMT
- Title: Solving Hard AI Planning Instances Using Curriculum-Driven Deep
Reinforcement Learning
- Authors: Dieqiao Feng, Carla P. Gomes, and Bart Selman
- Abstract summary: Sokoban is a PSPACE-complete planning task and represents one of the hardest domains for current AI planners.
Our approach based on deep reinforcement learning augmented with a curriculum-driven method is the first one to solve hard instances within one day of training.
- Score: 31.92282114603962
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite significant progress in general AI planning, certain domains remain
out of reach of current AI planning systems. Sokoban is a PSPACE-complete
planning task and represents one of the hardest domains for current AI
planners. Even domain-specific specialized search methods fail quickly due to
the exponential search complexity on hard instances. Our approach based on deep
reinforcement learning augmented with a curriculum-driven method is the first
one to solve hard instances within one day of training while other modern
solvers cannot solve these instances within any reasonable time limit. In
contrast to prior efforts, which use carefully handcrafted pruning techniques,
our approach automatically uncovers domain structure. Our results reveal that
deep RL provides a promising framework for solving previously unsolved AI
planning problems, provided a proper training curriculum can be devised.
Related papers
- Contractual Reinforcement Learning: Pulling Arms with Invisible Hands [68.77645200579181]
We propose a theoretical framework for aligning economic interests of different stakeholders in the online learning problems through contract design.
For the planning problem, we design an efficient dynamic programming algorithm to determine the optimal contracts against the far-sighted agent.
For the learning problem, we introduce a generic design of no-regret learning algorithms to untangle the challenges from robust design of contracts to the balance of exploration and exploitation.
arXiv Detail & Related papers (2024-07-01T16:53:00Z) - Machine Learning Insides OptVerse AI Solver: Design Principles and
Applications [74.67495900436728]
We present a comprehensive study on the integration of machine learning (ML) techniques into Huawei Cloud's OptVerse AI solver.
We showcase our methods for generating complex SAT and MILP instances utilizing generative models that mirror multifaceted structures of real-world problem.
We detail the incorporation of state-of-the-art parameter tuning algorithms which markedly elevate solver performance.
arXiv Detail & Related papers (2024-01-11T15:02:15Z) - Learn to Follow: Decentralized Lifelong Multi-agent Pathfinding via
Planning and Learning [46.354187895184154]
Multi-agent Pathfinding (MAPF) problem generally asks to find a set of conflict-free paths for a set of agents confined to a graph.
In this work, we investigate the decentralized MAPF setting, when the central controller that posses all the information on the agents' locations and goals is absent.
We focus on the practically important lifelong variant of MAPF, which involves continuously assigning new goals to the agents upon arrival to the previous ones.
arXiv Detail & Related papers (2023-10-02T13:51:32Z) - Divide & Conquer Imitation Learning [75.31752559017978]
Imitation Learning can be a powerful approach to bootstrap the learning process.
We present a novel algorithm designed to imitate complex robotic tasks from the states of an expert trajectory.
We show that our method imitates a non-holonomic navigation task and scales to a complex simulated robotic manipulation task with very high sample efficiency.
arXiv Detail & Related papers (2022-04-15T09:56:50Z) - Heuristic Search Planning with Deep Neural Networks using Imitation,
Attention and Curriculum Learning [1.0323063834827413]
This paper presents a network model to learn a capable of relating relating to distant parts of the state space via optimal plan imitation.
To counter the limitation of the method in the creation of problems of increasing difficulty, we demonstrate the use of curriculum learning, where newly solved problem instances are added to the training set.
arXiv Detail & Related papers (2021-12-03T14:01:16Z) - C-Planning: An Automatic Curriculum for Learning Goal-Reaching Tasks [133.40619754674066]
Goal-conditioned reinforcement learning can solve tasks in a wide range of domains, including navigation and manipulation.
We propose the distant goal-reaching task by using search at training time to automatically generate intermediate states.
E-step corresponds to planning an optimal sequence of waypoints using graph search, while the M-step aims to learn a goal-conditioned policy to reach those waypoints.
arXiv Detail & Related papers (2021-10-22T22:05:31Z) - A Novel Automated Curriculum Strategy to Solve Hard Sokoban Planning
Instances [30.32386551923329]
We present a curriculum-driven learning approach that is designed to solve a single hard instance.
We show how the smoothness of the task hardness impacts the final learning results.
Our approach can uncover plans that are far out of reach for any previous state-of-the-art Sokoban solver.
arXiv Detail & Related papers (2021-10-03T00:44:50Z) - Fixed Priority Global Scheduling from a Deep Learning Perspective [0.2578242050187029]
We first present how to adopt Deep Learning for real-time task scheduling through our preliminary work upon fixed priority global scheduling (FPGS) problems.
We then briefly discuss possible generalizations of Deep Learning adoption for several realistic and complicated FPGS scenarios.
arXiv Detail & Related papers (2020-12-05T10:52:33Z) - Learning to Stop While Learning to Predict [85.7136203122784]
Many algorithm-inspired deep models are restricted to a fixed-depth'' for all inputs.
Similar to algorithms, the optimal depth of a deep architecture may be different for different input instances.
In this paper, we tackle this varying depth problem using a steerable architecture.
We show that the learned deep model along with the stopping policy improves the performances on a diverse set of tasks.
arXiv Detail & Related papers (2020-06-09T07:22:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.