A Novel Automated Curriculum Strategy to Solve Hard Sokoban Planning
Instances
- URL: http://arxiv.org/abs/2110.00898v1
- Date: Sun, 3 Oct 2021 00:44:50 GMT
- Title: A Novel Automated Curriculum Strategy to Solve Hard Sokoban Planning
Instances
- Authors: Dieqiao Feng, Carla P. Gomes, Bart Selman
- Abstract summary: We present a curriculum-driven learning approach that is designed to solve a single hard instance.
We show how the smoothness of the task hardness impacts the final learning results.
Our approach can uncover plans that are far out of reach for any previous state-of-the-art Sokoban solver.
- Score: 30.32386551923329
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, we have witnessed tremendous progress in deep reinforcement
learning (RL) for tasks such as Go, Chess, video games, and robot control.
Nevertheless, other combinatorial domains, such as AI planning, still pose
considerable challenges for RL approaches. The key difficulty in those domains
is that a positive reward signal becomes {\em exponentially rare} as the
minimal solution length increases. So, an RL approach loses its training
signal. There has been promising recent progress by using a curriculum-driven
learning approach that is designed to solve a single hard instance. We present
a novel {\em automated} curriculum approach that dynamically selects from a
pool of unlabeled training instances of varying task complexity guided by our
{\em difficulty quantum momentum} strategy. We show how the smoothness of the
task hardness impacts the final learning results. In particular, as the size of
the instance pool increases, the ``hardness gap'' decreases, which facilitates
a smoother automated curriculum based learning process. Our automated
curriculum approach dramatically improves upon the previous approaches. We show
our results on Sokoban, which is a traditional PSPACE-complete planning problem
and presents a great challenge even for specialized solvers. Our RL agent can
solve hard instances that are far out of reach for any previous
state-of-the-art Sokoban solver. In particular, our approach can uncover plans
that require hundreds of steps, while the best previous search methods would
take many years of computing time to solve such instances. In addition, we show
that we can further boost the RL performance with an intricate coupling of our
automated curriculum approach with a curiosity-driven search strategy and a
graph neural net representation.
Related papers
- Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - Curriculum Learning in Job Shop Scheduling using Reinforcement Learning [0.3867363075280544]
Deep Reinforcement Learning (DRL) dynamically adjusts an agent's planning strategy in response to difficult instances.
We further improve DLR as an underlying method by actively incorporating the variability of difficulty within the same problem size into the design of the learning process.
arXiv Detail & Related papers (2023-05-17T13:15:27Z) - Leveraging Sequentiality in Reinforcement Learning from a Single
Demonstration [68.94506047556412]
We propose to leverage a sequential bias to learn control policies for complex robotic tasks using a single demonstration.
We show that DCIL-II can solve with unprecedented sample efficiency some challenging simulated tasks such as humanoid locomotion and stand-up.
arXiv Detail & Related papers (2022-11-09T10:28:40Z) - Learning to Optimize Permutation Flow Shop Scheduling via Graph-based
Imitation Learning [70.65666982566655]
Permutation flow shop scheduling (PFSS) is widely used in manufacturing systems.
We propose to train the model via expert-driven imitation learning, which accelerates convergence more stably and accurately.
Our model's network parameters are reduced to only 37% of theirs, and the solution gap of our model towards the expert solutions decreases from 6.8% to 1.3% on average.
arXiv Detail & Related papers (2022-10-31T09:46:26Z) - C-Planning: An Automatic Curriculum for Learning Goal-Reaching Tasks [133.40619754674066]
Goal-conditioned reinforcement learning can solve tasks in a wide range of domains, including navigation and manipulation.
We propose the distant goal-reaching task by using search at training time to automatically generate intermediate states.
E-step corresponds to planning an optimal sequence of waypoints using graph search, while the M-step aims to learn a goal-conditioned policy to reach those waypoints.
arXiv Detail & Related papers (2021-10-22T22:05:31Z) - An actor-critic algorithm with policy gradients to solve the job shop
scheduling problem using deep double recurrent agents [1.3812010983144802]
We propose a deep reinforcement learning methodology for the job shop scheduling problem (JSSP)
The aim is to build up a greedy-like able to learn on some distribution of JSSP instances, different in the number of jobs and machines.
As expected, the model can generalize, to some extent, to larger problems or instances originated by a different distribution from the one used in training.
arXiv Detail & Related papers (2021-10-18T07:55:39Z) - Ranking Cost: Building An Efficient and Scalable Circuit Routing Planner
with Evolution-Based Optimization [49.207538634692916]
We propose a new algorithm for circuit routing, named Ranking Cost, to form an efficient and trainable router.
In our method, we introduce a new set of variables called cost maps, which can help the A* router to find out proper paths.
Our algorithm is trained in an end-to-end manner and does not use any artificial data or human demonstration.
arXiv Detail & Related papers (2021-10-08T07:22:45Z) - MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven
Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems.
We propose a novel method for computing the normalized maximum likelihood (NML) distribution.
We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z) - Self-Imitation Learning by Planning [3.996275177789895]
Imitation learning (IL) enables robots to acquire skills quickly by transferring expert knowledge.
In long-horizon motion planning tasks, a challenging problem in deploying IL and RL methods is how to generate and collect massive, broadly distributed data.
We propose self-imitation learning by planning (SILP), where demonstration data are collected automatically by planning on the visited states from the current policy.
SILP is inspired by the observation that successfully visited states in the early reinforcement learning stage are collision-free nodes in the graph-search based motion planner.
arXiv Detail & Related papers (2021-03-25T13:28:38Z) - Solving Hard AI Planning Instances Using Curriculum-Driven Deep
Reinforcement Learning [31.92282114603962]
Sokoban is a PSPACE-complete planning task and represents one of the hardest domains for current AI planners.
Our approach based on deep reinforcement learning augmented with a curriculum-driven method is the first one to solve hard instances within one day of training.
arXiv Detail & Related papers (2020-06-04T08:13:12Z) - Online Constrained Model-based Reinforcement Learning [13.362455603441552]
Key requirement is the ability to handle continuous state and action spaces while remaining within a limited time and resource budget.
We propose a model based approach that combines Gaussian Process regression and Receding Horizon Control.
We test our approach on a cart pole swing-up environment and demonstrate the benefits of online learning on an autonomous racing task.
arXiv Detail & Related papers (2020-04-07T15:51:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.