System Design for an Integrated Lifelong Reinforcement Learning Agent
for Real-Time Strategy Games
- URL: http://arxiv.org/abs/2212.04603v1
- Date: Thu, 8 Dec 2022 23:32:57 GMT
- Title: System Design for an Integrated Lifelong Reinforcement Learning Agent
for Real-Time Strategy Games
- Authors: Indranil Sur, Zachary Daniels, Abrar Rahman, Kamil Faber, Gianmarco J.
Gallardo, Tyler L. Hayes, Cameron E. Taylor, Mustafa Burak Gurbuz, James
Smith, Sahana Joshi, Nathalie Japkowicz, Michael Baron, Zsolt Kira,
Christopher Kanan, Roberto Corizzo, Ajay Divakaran, Michael Piacentino, Jesse
Hostetler, Aswin Raghavan
- Abstract summary: Continual/lifelong learning (LL) involves minimizing forgetting of old tasks while maximizing a model's capability to learn new tasks.
We introduce the Lifelong Reinforcement Learning Components Framework (L2RLCF), which standardizes L2RL systems and assimilates different continual learning components.
We describe a case study that demonstrates how multiple independently-developed LL components can be integrated into a single realized system.
- Score: 34.3277278308442
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As Artificial and Robotic Systems are increasingly deployed and relied upon
for real-world applications, it is important that they exhibit the ability to
continually learn and adapt in dynamically-changing environments, becoming
Lifelong Learning Machines. Continual/lifelong learning (LL) involves
minimizing catastrophic forgetting of old tasks while maximizing a model's
capability to learn new tasks. This paper addresses the challenging lifelong
reinforcement learning (L2RL) setting. Pushing the state-of-the-art forward in
L2RL and making L2RL useful for practical applications requires more than
developing individual L2RL algorithms; it requires making progress at the
systems-level, especially research into the non-trivial problem of how to
integrate multiple L2RL algorithms into a common framework. In this paper, we
introduce the Lifelong Reinforcement Learning Components Framework (L2RLCF),
which standardizes L2RL systems and assimilates different continual learning
components (each addressing different aspects of the lifelong learning problem)
into a unified system. As an instantiation of L2RLCF, we develop a standard API
allowing easy integration of novel lifelong learning components. We describe a
case study that demonstrates how multiple independently-developed LL components
can be integrated into a single realized system. We also introduce an
evaluation environment in order to measure the effect of combining various
system components. Our evaluation environment employs different LL scenarios
(sequences of tasks) consisting of Starcraft-2 minigames and allows for the
fair, comprehensive, and quantitative comparison of different combinations of
components within a challenging common evaluation environment.
Related papers
- Interactive Continual Learning: Fast and Slow Thinking [19.253164551254734]
This paper presents a novel Interactive Continual Learning framework, enabled by collaborative interactions among models of various sizes.
To improve memory retrieval in System1, we introduce the CL-vMF mechanism, based on the von Mises-Fisher (vMF) distribution.
Comprehensive evaluation of our proposed ICL demonstrates significant resistance to forgetting and superior performance relative to existing methods.
arXiv Detail & Related papers (2024-03-05T03:37:28Z) - How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback.
Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities.
We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z) - True Knowledge Comes from Practice: Aligning LLMs with Embodied
Environments via Reinforcement Learning [37.10401435242991]
Large language models (LLMs) often fail in solving simple decision-making tasks due to misalignment of the knowledge in LLMs with environments.
We propose TWOSOME, a novel framework that deploys LLMs as decision-making agents to efficiently interact and align with embodied environments via RL.
arXiv Detail & Related papers (2024-01-25T13:03:20Z) - LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language
Models [56.25156596019168]
This paper introduces the LMRL-Gym benchmark for evaluating multi-turn RL for large language models (LLMs)
Our benchmark consists of 8 different language tasks, which require multiple rounds of language interaction and cover a range of tasks in open-ended dialogue and text games.
arXiv Detail & Related papers (2023-11-30T03:59:31Z) - A Domain-Agnostic Approach for Characterization of Lifelong Learning
Systems [128.63953314853327]
"Lifelong Learning" systems are capable of 1) Continuous Learning, 2) Transfer and Adaptation, and 3) Scalability.
We show that this suite of metrics can inform the development of varied and complex Lifelong Learning systems.
arXiv Detail & Related papers (2023-01-18T21:58:54Z) - Lifelong Reinforcement Learning with Modulating Masks [16.24639836636365]
Lifelong learning aims to create AI systems that continuously and incrementally learn during a lifetime, similar to biological learning.
Attempts so far have met problems, including catastrophic forgetting, interference among tasks, and the inability to exploit previous knowledge.
We show that lifelong reinforcement learning with modulating masks is a promising approach to lifelong learning, to the composition of knowledge to learn increasingly complex tasks, and to knowledge reuse for efficient and faster learning.
arXiv Detail & Related papers (2022-12-21T15:49:20Z) - Lifelong Machine Learning of Functionally Compositional Structures [7.99536002595393]
This dissertation presents a general-purpose framework for lifelong learning of functionally compositional structures.
The framework separates the learning into two stages: learning how to combine existing components to assimilate a novel problem, and learning how to adapt the existing components to accommodate the new problem.
Supervised learning evaluations found that 1) compositional models improve lifelong learning of diverse tasks, 2) the multi-stage process permits lifelong learning of compositional knowledge, and 3) the components learned by the framework represent self-contained and reusable functions.
arXiv Detail & Related papers (2022-07-25T15:24:25Z) - L2Explorer: A Lifelong Reinforcement Learning Assessment Environment [49.40779372040652]
Reinforcement learning solutions tend to generalize poorly when exposed to new tasks outside of the data distribution they are trained on.
We introduce a framework for continual reinforcement-learning development and assessment using Lifelong Learning Explorer (L2Explorer)
L2Explorer is a new, Unity-based, first-person 3D exploration environment that can be continuously reconfigured to generate a range of tasks and task variants structured into complex evaluation curricula.
arXiv Detail & Related papers (2022-03-14T19:20:26Z) - Continuous Coordination As a Realistic Scenario for Lifelong Learning [6.044372319762058]
We introduce a multi-agent lifelong learning testbed that supports both zero-shot and few-shot settings.
We evaluate several recent MARL methods, and benchmark state-of-the-art LLL algorithms in limited memory and computation.
We empirically show that the agents trained in our setup are able to coordinate well with unseen agents, without any additional assumptions made by previous works.
arXiv Detail & Related papers (2021-03-04T18:44:03Z) - Reset-Free Lifelong Learning with Skill-Space Planning [105.00539596788127]
We propose Lifelong Skill Planning (LiSP), an algorithmic framework for non-episodic lifelong RL.
LiSP learns the skills in an unsupervised manner using intrinsic rewards and plan over the learned skills using a learned dynamics model.
We demonstrate empirically that LiSP successfully enables long-horizon planning and learns agents that can avoid catastrophic failures even in challenging non-stationary and non-episodic environments.
arXiv Detail & Related papers (2020-12-07T09:33:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.