Related papers: A Novel Automated Curriculum Strategy to Solve Hard Sokoban Planning Instances

A Novel Automated Curriculum Strategy to Solve Hard Sokoban Planning Instances

URL: http://arxiv.org/abs/2110.00898v1
Date: Sun, 3 Oct 2021 00:44:50 GMT
Title: A Novel Automated Curriculum Strategy to Solve Hard Sokoban Planning Instances
Authors: Dieqiao Feng, Carla P. Gomes, Bart Selman
Abstract summary: We present a curriculum-driven learning approach that is designed to solve a single hard instance. We show how the smoothness of the task hardness impacts the final learning results. Our approach can uncover plans that are far out of reach for any previous state-of-the-art Sokoban solver.
Score: 30.32386551923329
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In recent years, we have witnessed tremendous progress in deep reinforcement learning (RL) for tasks such as Go, Chess, video games, and robot control. Nevertheless, other combinatorial domains, such as AI planning, still pose considerable challenges for RL approaches. The key difficulty in those domains is that a positive reward signal becomes {\em exponentially rare} as the minimal solution length increases. So, an RL approach loses its training signal. There has been promising recent progress by using a curriculum-driven learning approach that is designed to solve a single hard instance. We present a novel {\em automated} curriculum approach that dynamically selects from a pool of unlabeled training instances of varying task complexity guided by our {\em difficulty quantum momentum} strategy. We show how the smoothness of the task hardness impacts the final learning results. In particular, as the size of the instance pool increases, the ``hardness gap'' decreases, which facilitates a smoother automated curriculum based learning process. Our automated curriculum approach dramatically improves upon the previous approaches. We show our results on Sokoban, which is a traditional PSPACE-complete planning problem and presents a great challenge even for specialized solvers. Our RL agent can solve hard instances that are far out of reach for any previous state-of-the-art Sokoban solver. In particular, our approach can uncover plans that require hundreds of steps, while the best previous search methods would take many years of computing time to solve such instances. In addition, we show that we can further boost the RL performance with an intricate coupling of our automated curriculum approach with a curiosity-driven search strategy and a graph neural net representation.

Related papers

Probabilistic Curriculum Learning for Goal-Based Reinforcement Learning [2.5352713493505785]
Reinforcement learning -- algorithms that teach artificial agents to interact with environments by maximising reward signals -- has achieved significant success in recent years. One promising research direction involves introducing goals to allow multimodal policies, commonly through hierarchical or curriculum reinforcement learning. We present a novel probabilistic curriculum learning algorithm to suggest goals for reinforcement learning agents in continuous control and navigation tasks.
arXiv Detail & Related papers (2025-04-02T08:15:16Z)
Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data. In this paper, we propose an adaptive scheme for action quantization. We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z)
Curriculum Learning in Job Shop Scheduling using Reinforcement Learning [0.3867363075280544]
Deep Reinforcement Learning (DRL) dynamically adjusts an agent's planning strategy in response to difficult instances. We further improve DLR as an underlying method by actively incorporating the variability of difficulty within the same problem size into the design of the learning process.
arXiv Detail & Related papers (2023-05-17T13:15:27Z)
Leveraging Sequentiality in Reinforcement Learning from a Single Demonstration [68.94506047556412]
We propose to leverage a sequential bias to learn control policies for complex robotic tasks using a single demonstration. We show that DCIL-II can solve with unprecedented sample efficiency some challenging simulated tasks such as humanoid locomotion and stand-up.
arXiv Detail & Related papers (2022-11-09T10:28:40Z)
Learning to Optimize Permutation Flow Shop Scheduling via Graph-based Imitation Learning [70.65666982566655]
Permutation flow shop scheduling (PFSS) is widely used in manufacturing systems. We propose to train the model via expert-driven imitation learning, which accelerates convergence more stably and accurately. Our model's network parameters are reduced to only 37% of theirs, and the solution gap of our model towards the expert solutions decreases from 6.8% to 1.3% on average.
arXiv Detail & Related papers (2022-10-31T09:46:26Z)
C-Planning: An Automatic Curriculum for Learning Goal-Reaching Tasks [133.40619754674066]
Goal-conditioned reinforcement learning can solve tasks in a wide range of domains, including navigation and manipulation. We propose the distant goal-reaching task by using search at training time to automatically generate intermediate states. E-step corresponds to planning an optimal sequence of waypoints using graph search, while the M-step aims to learn a goal-conditioned policy to reach those waypoints.
arXiv Detail & Related papers (2021-10-22T22:05:31Z)
An actor-critic algorithm with policy gradients to solve the job shop scheduling problem using deep double recurrent agents [1.3812010983144802]
We propose a deep reinforcement learning methodology for the job shop scheduling problem (JSSP) The aim is to build up a greedy-like able to learn on some distribution of JSSP instances, different in the number of jobs and machines. As expected, the model can generalize, to some extent, to larger problems or instances originated by a different distribution from the one used in training.
arXiv Detail & Related papers (2021-10-18T07:55:39Z)
Ranking Cost: Building An Efficient and Scalable Circuit Routing Planner with Evolution-Based Optimization [49.207538634692916]
We propose a new algorithm for circuit routing, named Ranking Cost, to form an efficient and trainable router. In our method, we introduce a new set of variables called cost maps, which can help the A* router to find out proper paths. Our algorithm is trained in an end-to-end manner and does not use any artificial data or human demonstration.
arXiv Detail & Related papers (2021-10-08T07:22:45Z)
MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems. We propose a novel method for computing the normalized maximum likelihood (NML) distribution. We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z)
Self-Imitation Learning by Planning [3.996275177789895]
Imitation learning (IL) enables robots to acquire skills quickly by transferring expert knowledge. In long-horizon motion planning tasks, a challenging problem in deploying IL and RL methods is how to generate and collect massive, broadly distributed data. We propose self-imitation learning by planning (SILP), where demonstration data are collected automatically by planning on the visited states from the current policy. SILP is inspired by the observation that successfully visited states in the early reinforcement learning stage are collision-free nodes in the graph-search based motion planner.
arXiv Detail & Related papers (2021-03-25T13:28:38Z)
Solving Hard AI Planning Instances Using Curriculum-Driven Deep Reinforcement Learning [31.92282114603962]
Sokoban is a PSPACE-complete planning task and represents one of the hardest domains for current AI planners. Our approach based on deep reinforcement learning augmented with a curriculum-driven method is the first one to solve hard instances within one day of training.
arXiv Detail & Related papers (2020-06-04T08:13:12Z)
Online Constrained Model-based Reinforcement Learning [13.362455603441552]
Key requirement is the ability to handle continuous state and action spaces while remaining within a limited time and resource budget. We propose a model based approach that combines Gaussian Process regression and Receding Horizon Control. We test our approach on a cart pole swing-up environment and demonstrate the benefits of online learning on an autonomous racing task.
arXiv Detail & Related papers (2020-04-07T15:51:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.