NovGrid: A Flexible Grid World for Evaluating Agent Response to Novelty
- URL: http://arxiv.org/abs/2203.12117v1
- Date: Wed, 23 Mar 2022 01:06:04 GMT
- Title: NovGrid: A Flexible Grid World for Evaluating Agent Response to Novelty
- Authors: Jonathan Balloch, Zhiyu Lin, Mustafa Hussain, Aarun Srinivas, Robert
Wright, Xiangyu Peng, Julia Kim, Mark Riedl
- Abstract summary: We introduce NovGrid, a novelty generation framework built on MiniGrid.
Along with the core NovGrid we provide exemplar novelties aligned with our ontology and instantiate them as novelty templates.
We present a set of metrics built into our framework for the evaluation of novelty-adaptation-enabled machine-learning techniques.
- Score: 8.705624336757461
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A robust body of reinforcement learning techniques have been developed to
solve complex sequential decision making problems. However, these methods
assume that train and evaluation tasks come from similarly or identically
distributed environments. This assumption does not hold in real life where
small novel changes to the environment can make a previously learned policy
fail or introduce simpler solutions that might never be found. To that end we
explore the concept of {\em novelty}, defined in this work as the sudden change
to the mechanics or properties of environment. We provide an ontology of for
novelties most relevant to sequential decision making, which distinguishes
between novelties that affect objects versus actions, unary properties versus
non-unary relations, and the distribution of solutions to a task. We introduce
NovGrid, a novelty generation framework built on MiniGrid, acting as a toolkit
for rapidly developing and evaluating novelty-adaptation-enabled reinforcement
learning techniques. Along with the core NovGrid we provide exemplar novelties
aligned with our ontology and instantiate them as novelty templates that can be
applied to many MiniGrid-compliant environments. Finally, we present a set of
metrics built into our framework for the evaluation of
novelty-adaptation-enabled machine-learning techniques, and show
characteristics of a baseline RL model using these metrics.
Related papers
- No Regrets: Investigating and Improving Regret Approximations for Curriculum Discovery [53.08822154199948]
Unsupervised Environment Design (UED) methods have gained recent attention as their adaptive curricula promise to enable agents to be robust to in- and out-of-distribution tasks.
This work investigates how existing UED methods select training environments, focusing on task prioritisation metrics.
We develop a method that directly trains on scenarios with high learnability.
arXiv Detail & Related papers (2024-08-27T14:31:54Z) - Mamba-FSCIL: Dynamic Adaptation with Selective State Space Model for Few-Shot Class-Incremental Learning [113.89327264634984]
Few-shot class-incremental learning (FSCIL) confronts the challenge of integrating new classes into a model with minimal training samples.
Traditional methods widely adopt static adaptation relying on a fixed parameter space to learn from data that arrive sequentially.
We propose a dual selective SSM projector that dynamically adjusts the projection parameters based on the intermediate features for dynamic adaptation.
arXiv Detail & Related papers (2024-07-08T17:09:39Z) - I Know How: Combining Prior Policies to Solve New Tasks [17.214443593424498]
Multi-Task Reinforcement Learning aims at developing agents that are able to continually evolve and adapt to new scenarios.
Learning from scratch for each new task is not a viable or sustainable option.
We propose a new framework, I Know How, which provides a common formalization.
arXiv Detail & Related papers (2024-06-14T08:44:51Z) - Reinforcement Learning with Options and State Representation [105.82346211739433]
This thesis aims to explore the reinforcement learning field and build on existing methods to produce improved ones.
It addresses such goals by decomposing learning tasks in a hierarchical fashion known as Hierarchical Reinforcement Learning.
arXiv Detail & Related papers (2024-03-16T08:30:55Z) - Fine-Grained Knowledge Selection and Restoration for Non-Exemplar Class
Incremental Learning [64.14254712331116]
Non-exemplar class incremental learning aims to learn both the new and old tasks without accessing any training data from the past.
We propose a novel framework of fine-grained knowledge selection and restoration.
arXiv Detail & Related papers (2023-12-20T02:34:11Z) - Generalization to New Sequential Decision Making Tasks with In-Context
Learning [23.36106067650874]
Training autonomous agents that can learn new tasks from only a handful of demonstrations is a long-standing problem in machine learning.
In this paper, we show that naively applying transformers to sequential decision making problems does not enable in-context learning of new tasks.
We investigate different design choices and find that larger model and dataset sizes, as well as more task diversity, environmentity, and trajectory burstiness, all result in better in-context learning of new out-of-distribution tasks.
arXiv Detail & Related papers (2023-12-06T15:19:28Z) - Methods and Mechanisms for Interactive Novelty Handling in Adversarial
Environments [32.175953686781284]
We introduce general methods and architectural mechanisms for detecting and characterizing different types of novelties.
We demonstrate the effectiveness of the proposed methods in evaluations performed by a third party in the adversarial multi-agent board game Monopoly.
arXiv Detail & Related papers (2023-02-28T00:05:48Z) - RAPid-Learn: A Framework for Learning to Recover for Handling Novelties
in Open-World Environments [17.73296831597868]
RAPid-Learn is designed to formulate and solve modifications to a task's Markov Decision Process (MDPs) on-the-fly.
It is capable of exploiting domain knowledge to learn any new dynamics caused by the environmental changes.
We demonstrate its efficacy by introducing a wide variety of novelties in a gridworld environment inspired by Minecraft.
arXiv Detail & Related papers (2022-06-24T21:40:10Z) - Multi-Environment Meta-Learning in Stochastic Linear Bandits [49.387421094105136]
We consider the feasibility of meta-learning when task parameters are drawn from a mixture distribution instead of a single environment.
We propose a regularized version of the OFUL algorithm that achieves low regret on a new task without requiring knowledge of the environment from which the new task originates.
arXiv Detail & Related papers (2022-05-12T19:31:28Z) - REPTILE: A Proactive Real-Time Deep Reinforcement Learning Self-adaptive
Framework [0.6335848702857039]
A general framework is proposed to support the development of software systems that are able to adapt their behaviour according to the operating environment changes.
The proposed approach, named REPTILE, works in a complete proactive manner and relies on Deep Reinforcement Learning-based agents to react to events.
In our framework, two types of novelties are taken into account: those related to the context/environment and those related to the physical architecture itself.
The framework, predicting those novelties before their occurrence, extracts time-changing models of the environment and uses a suitable Markov Decision Process to deal with the real-time setting.
arXiv Detail & Related papers (2022-03-28T12:38:08Z) - Importance Weighted Policy Learning and Adaptation [89.46467771037054]
We study a complementary approach which is conceptually simple, general, modular and built on top of recent improvements in off-policy learning.
The framework is inspired by ideas from the probabilistic inference literature and combines robust off-policy learning with a behavior prior.
Our approach achieves competitive adaptation performance on hold-out tasks compared to meta reinforcement learning baselines and can scale to complex sparse-reward scenarios.
arXiv Detail & Related papers (2020-09-10T14:16:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.