Efficient Open-world Reinforcement Learning via Knowledge Distillation
and Autonomous Rule Discovery
- URL: http://arxiv.org/abs/2311.14270v1
- Date: Fri, 24 Nov 2023 04:12:50 GMT
- Title: Efficient Open-world Reinforcement Learning via Knowledge Distillation
and Autonomous Rule Discovery
- Authors: Ekaterina Nikonova, Cheng Xue, Jochen Renz
- Abstract summary: Rule-driven deep Q-learning agent (RDQ) as one possible implementation of framework.
We show that RDQ successfully extracts task-specific rules as it interacts with the world.
In experiments, we show that the RDQ agent is significantly more resilient to the novelties than the baseline agents.
- Score: 5.680463564655267
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Deep reinforcement learning suffers from catastrophic forgetting and sample
inefficiency making it less applicable to the ever-changing real world.
However, the ability to use previously learned knowledge is essential for AI
agents to quickly adapt to novelties. Often, certain spatial information
observed by the agent in the previous interactions can be leveraged to infer
task-specific rules. Inferred rules can then help the agent to avoid
potentially dangerous situations in the previously unseen states and guide the
learning process increasing agent's novelty adaptation speed. In this work, we
propose a general framework that is applicable to deep reinforcement learning
agents. Our framework provides the agent with an autonomous way to discover the
task-specific rules in the novel environments and self-supervise it's learning.
We provide a rule-driven deep Q-learning agent (RDQ) as one possible
implementation of that framework. We show that RDQ successfully extracts
task-specific rules as it interacts with the world and uses them to drastically
increase its learning efficiency. In our experiments, we show that the RDQ
agent is significantly more resilient to the novelties than the baseline
agents, and is able to detect and adapt to novel situations faster.
Related papers
- ReLIC: A Recipe for 64k Steps of In-Context Reinforcement Learning for Embodied AI [44.77897322913095]
We present ReLIC, a new approach for in-context reinforcement learning for embodied agents.
With ReLIC, agents are capable of adapting to new environments using 64,000 steps of in-context experience.
We find that ReLIC is capable of few-shot imitation learning despite never being trained with expert demonstrations.
arXiv Detail & Related papers (2024-10-03T17:58:11Z) - No Regrets: Investigating and Improving Regret Approximations for Curriculum Discovery [53.08822154199948]
Unsupervised Environment Design (UED) methods have gained recent attention as their adaptive curricula promise to enable agents to be robust to in- and out-of-distribution tasks.
This work investigates how existing UED methods select training environments, focusing on task prioritisation metrics.
We develop a method that directly trains on scenarios with high learnability.
arXiv Detail & Related papers (2024-08-27T14:31:54Z) - Self-Supervised Curriculum Generation for Autonomous Reinforcement
Learning without Task-Specific Knowledge [25.168236693829783]
A significant bottleneck in applying current reinforcement learning algorithms to real-world scenarios is the need to reset the environment between every episode.
We propose a novel ARL algorithm that can generate a curriculum adaptive to the agent's learning progress without task-specific knowledge.
arXiv Detail & Related papers (2023-11-15T18:40:10Z) - Human-Timescale Adaptation in an Open-Ended Task Space [56.55530165036327]
We show that training an RL agent at scale leads to a general in-context learning algorithm that can adapt to open-ended novel embodied 3D problems as quickly as humans.
Our results lay the foundation for increasingly general and adaptive RL agents that perform well across ever-larger open-ended domains.
arXiv Detail & Related papers (2023-01-18T15:39:21Z) - You Only Live Once: Single-Life Reinforcement Learning [124.1738675154651]
In many real-world situations, the goal might not be to learn a policy that can do the task repeatedly, but simply to perform a new task successfully once in a single trial.
We formalize this problem setting, where an agent must complete a task within a single episode without interventions.
We propose an algorithm, $Q$-weighted adversarial learning (QWALE), which employs a distribution matching strategy.
arXiv Detail & Related papers (2022-10-17T09:00:11Z) - Self-Initiated Open World Learning for Autonomous AI Agents [16.41396764793912]
As more and more AI agents are used in practice, it is time to think about how to make these agents fully autonomous.
This paper proposes a theoretic framework for this learning paradigm to promote the research of building Self-initiated Open world Learning agents.
arXiv Detail & Related papers (2021-10-21T18:11:02Z) - Hierarchical Skills for Efficient Exploration [70.62309286348057]
In reinforcement learning, pre-trained low-level skills have the potential to greatly facilitate exploration.
Prior knowledge of the downstream task is required to strike the right balance between generality (fine-grained control) and specificity (faster learning) in skill design.
We propose a hierarchical skill learning framework that acquires skills of varying complexity in an unsupervised manner.
arXiv Detail & Related papers (2021-10-20T22:29:32Z) - Coverage as a Principle for Discovering Transferable Behavior in
Reinforcement Learning [16.12658895065585]
We argue that representation alone is not enough for efficient transfer in challenging domains and explore how to transfer knowledge through behavior.
The behavior of pre-trained policies may be used for solving the task at hand (exploitation) or for collecting useful data to solve the problem (exploration)
arXiv Detail & Related papers (2021-02-24T16:51:02Z) - Latent Skill Planning for Exploration and Transfer [49.25525932162891]
In this paper, we investigate how these two approaches can be integrated into a single reinforcement learning agent.
We leverage the idea of partial amortization for fast adaptation at test time.
We demonstrate the benefits of our design decisions across a suite of challenging locomotion tasks.
arXiv Detail & Related papers (2020-11-27T18:40:03Z) - Safe Reinforcement Learning via Curriculum Induction [94.67835258431202]
In safety-critical applications, autonomous agents may need to learn in an environment where mistakes can be very costly.
Existing safe reinforcement learning methods make an agent rely on priors that let it avoid dangerous situations.
This paper presents an alternative approach inspired by human teaching, where an agent learns under the supervision of an automatic instructor.
arXiv Detail & Related papers (2020-06-22T10:48:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.