DinerDash Gym: A Benchmark for Policy Learning in High-Dimensional
Action Space
- URL: http://arxiv.org/abs/2007.06207v1
- Date: Mon, 13 Jul 2020 06:22:55 GMT
- Title: DinerDash Gym: A Benchmark for Policy Learning in High-Dimensional
Action Space
- Authors: Siwei Chen, Xiao Ma, David Hsu
- Abstract summary: We propose a new benchmark task called Diner Dash for evaluating the performance in a complicated task with high dimensional action space.
We also introduce Decomposed Policy Graph Modelling (DPGM), an algorithm that combines both graph modelling and deep learning to allow explicit domain knowledge embedding.
- Score: 30.035087527984345
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It has been arduous to assess the progress of a policy learning algorithm in
the domain of hierarchical task with high dimensional action space due to the
lack of a commonly accepted benchmark. In this work, we propose a new
light-weight benchmark task called Diner Dash for evaluating the performance in
a complicated task with high dimensional action space. In contrast to the
traditional Atari games that only have a flat structure of goals and very few
actions, the proposed benchmark task has a hierarchical task structure and size
of 57 for the action space and hence can facilitate the development of policy
learning in complicated tasks. On top of that, we introduce Decomposed Policy
Graph Modelling (DPGM), an algorithm that combines both graph modelling and
deep learning to allow explicit domain knowledge embedding and achieves
significant improvement comparing to the baseline. In the experiments, we have
shown the effectiveness of the domain knowledge injection via a specially
designed imitation algorithm as well as results of other popular algorithms.
Related papers
- Clustering-based Domain-Incremental Learning [4.835091081509403]
Key challenge in continual learning is the so-called "catastrophic forgetting problem"
We propose an online clustering-based approach on a dynamically updated finite pool of samples or gradients.
We demonstrate the effectiveness of the proposed strategy and its promising performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-09-21T13:49:05Z) - AI planning in the imagination: High-level planning on learned abstract
search spaces [68.75684174531962]
We propose a new method, called PiZero, that gives an agent the ability to plan in an abstract search space that the agent learns during training.
We evaluate our method on multiple domains, including the traveling salesman problem, Sokoban, 2048, the facility location problem, and Pacman.
arXiv Detail & Related papers (2023-08-16T22:47:16Z) - Hierarchical Imitation Learning with Vector Quantized Models [77.67190661002691]
We propose to use reinforcement learning to identify subgoals in expert trajectories.
We build a vector-quantized generative model for the identified subgoals to perform subgoal-level planning.
In experiments, the algorithm excels at solving complex, long-horizon decision-making problems outperforming state-of-the-art.
arXiv Detail & Related papers (2023-01-30T15:04:39Z) - Option-Aware Adversarial Inverse Reinforcement Learning for Robotic
Control [44.77500987121531]
Hierarchical Imitation Learning (HIL) has been proposed to recover highly-complex behaviors in long-horizon tasks from expert demonstrations.
We develop a novel HIL algorithm based on Adversarial Inverse Reinforcement Learning.
We also propose a Variational Autoencoder framework for learning with our objectives in an end-to-end fashion.
arXiv Detail & Related papers (2022-10-05T00:28:26Z) - Fast Inference and Transfer of Compositional Task Structures for
Few-shot Task Generalization [101.72755769194677]
We formulate it as a few-shot reinforcement learning problem where a task is characterized by a subtask graph.
Our multi-task subtask graph inferencer (MTSGI) first infers the common high-level task structure in terms of the subtask graph from the training tasks.
Our experiment results on 2D grid-world and complex web navigation domains show that the proposed method can learn and leverage the common underlying structure of the tasks for faster adaptation to the unseen tasks.
arXiv Detail & Related papers (2022-05-25T10:44:25Z) - Benchmarking Deep Reinforcement Learning Algorithms for Vision-based
Robotics [11.225021326001778]
This paper presents a benchmarking study of some of the state-of-the-art reinforcement learning algorithms used for solving two vision-based robotics problems.
The performances of these algorithms are compared against PyBullet's two simulation environments known as KukaDiverseObjectEnv and RacecarZEDGymEnv respectively.
arXiv Detail & Related papers (2022-01-11T22:45:25Z) - UDA-COPE: Unsupervised Domain Adaptation for Category-level Object Pose
Estimation [84.16372642822495]
We propose an unsupervised domain adaptation (UDA) for category-level object pose estimation, called textbfUDA-COPE.
Inspired by the recent multi-modal UDA techniques, the proposed method exploits a teacher-student self-supervised learning scheme to train a pose estimation network without using target domain labels.
arXiv Detail & Related papers (2021-11-24T16:00:48Z) - Compositional Reinforcement Learning from Logical Specifications [21.193231846438895]
Recent approaches automatically generate a reward function from a given specification and use a suitable reinforcement learning algorithm to learn a policy.
We develop a compositional learning approach, called DiRL, that interleaves high-level planning and reinforcement learning.
Our approach then incorporates reinforcement learning to learn neural network policies for each edge (sub-task) within a Dijkstra-style planning algorithm to compute a high-level plan in the graph.
arXiv Detail & Related papers (2021-06-25T22:54:28Z) - Continuous Control for Searching and Planning with a Learned Model [5.196149362684628]
Decision-making agents with planning capabilities have achieved huge success in the challenging domain like Chess, Shogi, and Go.
Researchers proposed the MuZero algorithm that can learn the dynamical model through the interactions with the environment.
We show the proposed algorithm outperforms the soft actor-critic (SAC) algorithm, a state-of-the-art model-free deep reinforcement learning algorithm.
arXiv Detail & Related papers (2020-06-12T19:10:41Z) - Zeroth-Order Supervised Policy Improvement [94.0748002906652]
Policy gradient (PG) algorithms have been widely used in reinforcement learning (RL)
We propose Zeroth-Order Supervised Policy Improvement (ZOSPI)
ZOSPI exploits the estimated value function $Q$ globally while preserving the local exploitation of the PG methods.
arXiv Detail & Related papers (2020-06-11T16:49:23Z) - Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension.
We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation.
These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.