Alchemy: A structured task distribution for meta-reinforcement learning
- URL: http://arxiv.org/abs/2102.02926v1
- Date: Thu, 4 Feb 2021 23:40:44 GMT
- Title: Alchemy: A structured task distribution for meta-reinforcement learning
- Authors: Jane X. Wang, Michael King, Nicolas Porcel, Zeb Kurth-Nelson, Tina
Zhu, Charlie Deck, Peter Choy, Mary Cassin, Malcolm Reynolds, Francis Song,
Gavin Buttimore, David P. Reichert, Neil Rabinowitz, Loic Matthey, Demis
Hassabis, Alexander Lerchner, Matthew Botvinick
- Abstract summary: We introduce a new benchmark for meta-RL research, which combines structural richness with structural transparency.
Alchemy is a 3D video game, which involves a latent causal structure that is resampled procedurally from episode to episode.
We evaluate a pair of powerful RL agents on Alchemy and present an in-depth analysis of one of these agents.
- Score: 52.75769317355963
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There has been rapidly growing interest in meta-learning as a method for
increasing the flexibility and sample efficiency of reinforcement learning. One
problem in this area of research, however, has been a scarcity of adequate
benchmark tasks. In general, the structure underlying past benchmarks has
either been too simple to be inherently interesting, or too ill-defined to
support principled analysis. In the present work, we introduce a new benchmark
for meta-RL research, which combines structural richness with structural
transparency. Alchemy is a 3D video game, implemented in Unity, which involves
a latent causal structure that is resampled procedurally from episode to
episode, affording structure learning, online inference, hypothesis testing and
action sequencing based on abstract domain knowledge. We evaluate a pair of
powerful RL agents on Alchemy and present an in-depth analysis of one of these
agents. Results clearly indicate a frank and specific failure of meta-learning,
providing validation for Alchemy as a challenging benchmark for meta-RL.
Concurrent with this report, we are releasing Alchemy as public resource,
together with a suite of analysis tools and sample agent trajectories.
Related papers
- One-step Structure Prediction and Screening for Protein-Ligand Complexes using Multi-Task Geometric Deep Learning [6.605588716386855]
We show that LigPose can be accurately tackled with a single model, namely LigPose, based on multi-task geometric deep learning.
LigPose represents the ligand and the protein pair as a graph, with the learning of binding strength and atomic interactions as auxiliary tasks.
Experiments show LigPose achieved state-of-the-art performance on major tasks in drug research.
arXiv Detail & Related papers (2024-08-21T05:53:50Z) - LLM Inference Unveiled: Survey and Roofline Model Insights [62.92811060490876]
Large Language Model (LLM) inference is rapidly evolving, presenting a unique blend of opportunities and challenges.
Our survey stands out from traditional literature reviews by not only summarizing the current state of research but also by introducing a framework based on roofline model.
This framework identifies the bottlenecks when deploying LLMs on hardware devices and provides a clear understanding of practical problems.
arXiv Detail & Related papers (2024-02-26T07:33:05Z) - Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning [50.47568731994238]
Key method for creating Artificial Intelligence (AI) agents is Reinforcement Learning (RL)
This paper presents a general framework model for integrating and learning structured reasoning into AI agents' policies.
arXiv Detail & Related papers (2023-12-22T17:57:57Z) - MERMAIDE: Learning to Align Learners using Model-Based Meta-Learning [62.065503126104126]
We study how a principal can efficiently and effectively intervene on the rewards of a previously unseen learning agent in order to induce desirable outcomes.
This is relevant to many real-world settings like auctions or taxation, where the principal may not know the learning behavior nor the rewards of real people.
We introduce MERMAIDE, a model-based meta-learning framework to train a principal that can quickly adapt to out-of-distribution agents.
arXiv Detail & Related papers (2023-04-10T15:44:50Z) - Generalization Properties of Retrieval-based Models [50.35325326050263]
Retrieval-based machine learning methods have enjoyed success on a wide range of problems.
Despite growing literature showcasing the promise of these models, the theoretical underpinning for such models remains underexplored.
We present a formal treatment of retrieval-based models to characterize their generalization ability.
arXiv Detail & Related papers (2022-10-06T00:33:01Z) - A model-based approach to meta-Reinforcement Learning: Transformers and
tree search [1.1602089225841632]
We show the relevance of model-based approaches with online planning to perform exploration and exploitation successfully in meta-RL.
We show the efficiency of the Transformer architecture to learn complex dynamics that arise from latent spaces present in meta-RL problems.
arXiv Detail & Related papers (2022-08-24T13:30:26Z) - Provable Hierarchy-Based Meta-Reinforcement Learning [50.17896588738377]
We analyze HRL in the meta-RL setting, where learner learns latent hierarchical structure during meta-training for use in a downstream task.
We provide "diversity conditions" which, together with a tractable optimism-based algorithm, guarantee sample-efficient recovery of this natural hierarchy.
Our bounds incorporate common notions in HRL literature such as temporal and state/action abstractions, suggesting that our setting and analysis capture important features of HRL in practice.
arXiv Detail & Related papers (2021-10-18T17:56:02Z) - Comparative Code Structure Analysis using Deep Learning for Performance
Prediction [18.226950022938954]
This paper aims to assess the feasibility of using purely static information (e.g., abstract syntax tree or AST) of applications to predict performance change based on the change in code structure.
Our evaluations of several deep embedding learning methods demonstrate that tree-based Long Short-Term Memory (LSTM) models can leverage the hierarchical structure of source-code to discover latent representations and achieve up to 84% (individual problem) and 73% (combined dataset with multiple of problems) accuracy in predicting the change in performance.
arXiv Detail & Related papers (2021-02-12T16:59:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.