Obstacle Tower Without Human Demonstrations: How Far a Deep Feed-Forward
Network Goes with Reinforcement Learning
- URL: http://arxiv.org/abs/2004.00567v2
- Date: Mon, 20 Jul 2020 15:07:52 GMT
- Title: Obstacle Tower Without Human Demonstrations: How Far a Deep Feed-Forward
Network Goes with Reinforcement Learning
- Authors: Marco Pleines, Jenia Jitsev, Mike Preuss, and Frank Zimmer
- Abstract summary: The Obstacle Tower Challenge is the task to master a procedurally generated chain of levels that subsequently get harder to complete.
We present an approach that performed competitively (placed 7th) but starts completely from scratch by means of Deep Reinforcement Learning.
- Score: 1.699937048243873
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Obstacle Tower Challenge is the task to master a procedurally generated
chain of levels that subsequently get harder to complete. Whereas the most top
performing entries of last year's competition used human demonstrations or
reward shaping to learn how to cope with the challenge, we present an approach
that performed competitively (placed 7th) but starts completely from scratch by
means of Deep Reinforcement Learning with a relatively simple feed-forward deep
network structure. We especially look at the generalization performance of the
taken approach concerning different seeds and various visual themes that have
become available after the competition, and investigate where the agent fails
and why. Note that our approach does not possess a short-term memory like
employing recurrent hidden states. With this work, we hope to contribute to a
better understanding of what is possible with a relatively simple, flexible
solution that can be applied to learning in environments featuring complex 3D
visual input where the abstract task structure itself is still fairly simple.
Related papers
- You Only Live Once: Single-Life Reinforcement Learning [124.1738675154651]
In many real-world situations, the goal might not be to learn a policy that can do the task repeatedly, but simply to perform a new task successfully once in a single trial.
We formalize this problem setting, where an agent must complete a task within a single episode without interventions.
We propose an algorithm, $Q$-weighted adversarial learning (QWALE), which employs a distribution matching strategy.
arXiv Detail & Related papers (2022-10-17T09:00:11Z) - Learning from Guided Play: A Scheduled Hierarchical Approach for
Improving Exploration in Adversarial Imitation Learning [7.51557557629519]
We present Learning from Guided Play (LfGP), a framework in which we leverage expert demonstrations of, in addition to a main task, multiple auxiliary tasks.
This affords many benefits: learning efficiency is improved for main tasks with challenging bottleneck transitions, expert data becomes reusable between tasks, and transfer learning through the reuse of learned auxiliary task models becomes possible.
arXiv Detail & Related papers (2021-12-16T14:58:08Z) - A Novel Automated Curriculum Strategy to Solve Hard Sokoban Planning
Instances [30.32386551923329]
We present a curriculum-driven learning approach that is designed to solve a single hard instance.
We show how the smoothness of the task hardness impacts the final learning results.
Our approach can uncover plans that are far out of reach for any previous state-of-the-art Sokoban solver.
arXiv Detail & Related papers (2021-10-03T00:44:50Z) - Progressive Stage-wise Learning for Unsupervised Feature Representation
Enhancement [83.49553735348577]
We propose the Progressive Stage-wise Learning (PSL) framework for unsupervised learning.
Our experiments show that PSL consistently improves results for the leading unsupervised learning methods.
arXiv Detail & Related papers (2021-06-10T07:33:19Z) - Thinking Deeply with Recurrence: Generalizing from Easy to Hard
Sequential Reasoning Problems [51.132938969015825]
We observe that recurrent networks have the uncanny ability to closely emulate the behavior of non-recurrent deep models.
We show that recurrent networks that are trained to solve simple mazes with few recurrent steps can indeed solve much more complex problems simply by performing additional recurrences during inference.
arXiv Detail & Related papers (2021-02-22T14:09:20Z) - Bridging the Imitation Gap by Adaptive Insubordination [88.35564081175642]
We show that when the teaching agent makes decisions with access to privileged information, this information is marginalized during imitation learning.
We propose 'Adaptive Insubordination' (ADVISOR) to address this gap.
ADVISOR dynamically weights imitation and reward-based reinforcement learning losses during training, enabling on-the-fly switching between imitation and exploration.
arXiv Detail & Related papers (2020-07-23T17:59:57Z) - Complex Skill Acquisition Through Simple Skill Imitation Learning [0.0]
We propose a new algorithm that trains neural network policies on simple, easy-to-learn skills.
We focus on the case in which the complex task comprises a concurrent (and possibly sequential) combination of the simpler subtasks.
Our algorithm consistently outperforms a state-of-the-art baseline in training speed and overall performance.
arXiv Detail & Related papers (2020-07-20T17:06:26Z) - Planning to Explore via Self-Supervised World Models [120.31359262226758]
Plan2Explore is a self-supervised reinforcement learning agent.
We present a new approach to self-supervised exploration and fast adaptation to new tasks.
Without any training supervision or task-specific interaction, Plan2Explore outperforms prior self-supervised exploration methods.
arXiv Detail & Related papers (2020-05-12T17:59:45Z) - Learning Neural-Symbolic Descriptive Planning Models via Cube-Space
Priors: The Voyage Home (to STRIPS) [13.141761152863868]
We show that our neuro-symbolic architecture is trained end-to-end to produce a succinct and effective discrete state transition model from images alone.
Our target representation is already in a form that off-the-shelf solvers can consume, and opens the door to the rich array of modern search capabilities.
arXiv Detail & Related papers (2020-04-27T15:01:54Z) - Weakly-Supervised Reinforcement Learning for Controllable Behavior [126.04932929741538]
Reinforcement learning (RL) is a powerful framework for learning to take actions to solve tasks.
In many settings, an agent must winnow down the inconceivably large space of all possible tasks to the single task that it is currently being asked to solve.
We introduce a framework for using weak supervision to automatically disentangle this semantically meaningful subspace of tasks from the enormous space of nonsensical "chaff" tasks.
arXiv Detail & Related papers (2020-04-06T17:50:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.