Related papers: System Design for an Integrated Lifelong Reinforcement Learning Agent for Real-Time Strategy Games

System Design for an Integrated Lifelong Reinforcement Learning Agent for Real-Time Strategy Games

URL: http://arxiv.org/abs/2212.04603v1
Date: Thu, 8 Dec 2022 23:32:57 GMT
Title: System Design for an Integrated Lifelong Reinforcement Learning Agent for Real-Time Strategy Games
Authors: Indranil Sur, Zachary Daniels, Abrar Rahman, Kamil Faber, Gianmarco J. Gallardo, Tyler L. Hayes, Cameron E. Taylor, Mustafa Burak Gurbuz, James Smith, Sahana Joshi, Nathalie Japkowicz, Michael Baron, Zsolt Kira, Christopher Kanan, Roberto Corizzo, Ajay Divakaran, Michael Piacentino, Jesse Hostetler, Aswin Raghavan
Abstract summary: Continual/lifelong learning (LL) involves minimizing forgetting of old tasks while maximizing a model's capability to learn new tasks. We introduce the Lifelong Reinforcement Learning Components Framework (L2RLCF), which standardizes L2RL systems and assimilates different continual learning components. We describe a case study that demonstrates how multiple independently-developed LL components can be integrated into a single realized system.
Score: 34.3277278308442
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As Artificial and Robotic Systems are increasingly deployed and relied upon for real-world applications, it is important that they exhibit the ability to continually learn and adapt in dynamically-changing environments, becoming Lifelong Learning Machines. Continual/lifelong learning (LL) involves minimizing catastrophic forgetting of old tasks while maximizing a model's capability to learn new tasks. This paper addresses the challenging lifelong reinforcement learning (L2RL) setting. Pushing the state-of-the-art forward in L2RL and making L2RL useful for practical applications requires more than developing individual L2RL algorithms; it requires making progress at the systems-level, especially research into the non-trivial problem of how to integrate multiple L2RL algorithms into a common framework. In this paper, we introduce the Lifelong Reinforcement Learning Components Framework (L2RLCF), which standardizes L2RL systems and assimilates different continual learning components (each addressing different aspects of the lifelong learning problem) into a unified system. As an instantiation of L2RLCF, we develop a standard API allowing easy integration of novel lifelong learning components. We describe a case study that demonstrates how multiple independently-developed LL components can be integrated into a single realized system. We also introduce an evaluation environment in order to measure the effect of combining various system components. Our evaluation environment employs different LL scenarios (sequences of tasks) consisting of Starcraft-2 minigames and allows for the fair, comprehensive, and quantitative comparison of different combinations of components within a challenging common evaluation environment.

Related papers

CARoL: Context-aware Adaptation for Robot Learning [12.068046643461525]
We propose Context-aware Adaptation for Robot Learning (CARoL) to efficiently learn a similar but distinct new task from prior knowledge.<n>CARoL incorporates context awareness by analyzing state transitions in system dynamics to identify similarities between the new task and prior knowledge.<n>We validate the efficiency and generalizability of CARoL on both simulated robotic platforms and physical ground vehicles.
arXiv Detail & Related papers (2025-06-08T06:05:32Z)
ImagineBench: Evaluating Reinforcement Learning with Large Language Model Rollouts [10.273192140887481]
A central challenge in reinforcement learning (RL) is its dependence on extensive real-world interaction data to learn task-specific policies.<n>We introduce ImagineBench, the first comprehensive benchmark for evaluating offline RL algorithms.<n>We observe that simply applying existing offline RL algorithms leads to suboptimal performance on unseen tasks.
arXiv Detail & Related papers (2025-05-15T06:45:37Z)
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning [87.30285670315334]
textbfR1-Searcher is a novel two-stage outcome-based RL approach designed to enhance the search capabilities of Large Language Models. Our framework relies exclusively on RL, without requiring process rewards or distillation for a cold start. Our experiments demonstrate that our method significantly outperforms previous strong RAG methods, even when compared to the closed-source GPT-4o-mini.
arXiv Detail & Related papers (2025-03-07T17:14:44Z)
OmniRL: In-Context Reinforcement Learning by Large-Scale Meta-Training in Randomized Worlds [35.652208216209985]
We introduce OmniRL, a highly generalizable in-context reinforcement learning model that is meta-trained on hundreds of thousands of diverse tasks. For the first time, we demonstrate that in-context learning (ICL) alone, without any gradient-based fine-tuning, can successfully tackle unseen Gymnasium tasks.
arXiv Detail & Related papers (2025-02-05T03:59:13Z)
Interactive Continual Learning: Fast and Slow Thinking [19.253164551254734]
This paper presents a novel Interactive Continual Learning framework, enabled by collaborative interactions among models of various sizes. To improve memory retrieval in System1, we introduce the CL-vMF mechanism, based on the von Mises-Fisher (vMF) distribution. Comprehensive evaluation of our proposed ICL demonstrates significant resistance to forgetting and superior performance relative to existing methods.
arXiv Detail & Related papers (2024-03-05T03:37:28Z)
How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback. Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities. We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z)
True Knowledge Comes from Practice: Aligning LLMs with Embodied Environments via Reinforcement Learning [37.10401435242991]
Large language models (LLMs) often fail in solving simple decision-making tasks due to misalignment of the knowledge in LLMs with environments. We propose TWOSOME, a novel framework that deploys LLMs as decision-making agents to efficiently interact and align with embodied environments via RL.
arXiv Detail & Related papers (2024-01-25T13:03:20Z)
LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models [56.25156596019168]
This paper introduces the LMRL-Gym benchmark for evaluating multi-turn RL for large language models (LLMs) Our benchmark consists of 8 different language tasks, which require multiple rounds of language interaction and cover a range of tasks in open-ended dialogue and text games.
arXiv Detail & Related papers (2023-11-30T03:59:31Z)
A Domain-Agnostic Approach for Characterization of Lifelong Learning Systems [128.63953314853327]
"Lifelong Learning" systems are capable of 1) Continuous Learning, 2) Transfer and Adaptation, and 3) Scalability. We show that this suite of metrics can inform the development of varied and complex Lifelong Learning systems.
arXiv Detail & Related papers (2023-01-18T21:58:54Z)
Lifelong Reinforcement Learning with Modulating Masks [16.24639836636365]
Lifelong learning aims to create AI systems that continuously and incrementally learn during a lifetime, similar to biological learning. Attempts so far have met problems, including catastrophic forgetting, interference among tasks, and the inability to exploit previous knowledge. We show that lifelong reinforcement learning with modulating masks is a promising approach to lifelong learning, to the composition of knowledge to learn increasingly complex tasks, and to knowledge reuse for efficient and faster learning.
arXiv Detail & Related papers (2022-12-21T15:49:20Z)
Lifelong Machine Learning of Functionally Compositional Structures [7.99536002595393]
This dissertation presents a general-purpose framework for lifelong learning of functionally compositional structures. The framework separates the learning into two stages: learning how to combine existing components to assimilate a novel problem, and learning how to adapt the existing components to accommodate the new problem. Supervised learning evaluations found that 1) compositional models improve lifelong learning of diverse tasks, 2) the multi-stage process permits lifelong learning of compositional knowledge, and 3) the components learned by the framework represent self-contained and reusable functions.
arXiv Detail & Related papers (2022-07-25T15:24:25Z)
L2Explorer: A Lifelong Reinforcement Learning Assessment Environment [49.40779372040652]
Reinforcement learning solutions tend to generalize poorly when exposed to new tasks outside of the data distribution they are trained on. We introduce a framework for continual reinforcement-learning development and assessment using Lifelong Learning Explorer (L2Explorer) L2Explorer is a new, Unity-based, first-person 3D exploration environment that can be continuously reconfigured to generate a range of tasks and task variants structured into complex evaluation curricula.
arXiv Detail & Related papers (2022-03-14T19:20:26Z)
Continuous Coordination As a Realistic Scenario for Lifelong Learning [6.044372319762058]
We introduce a multi-agent lifelong learning testbed that supports both zero-shot and few-shot settings. We evaluate several recent MARL methods, and benchmark state-of-the-art LLL algorithms in limited memory and computation. We empirically show that the agents trained in our setup are able to coordinate well with unseen agents, without any additional assumptions made by previous works.
arXiv Detail & Related papers (2021-03-04T18:44:03Z)
Reset-Free Lifelong Learning with Skill-Space Planning [105.00539596788127]
We propose Lifelong Skill Planning (LiSP), an algorithmic framework for non-episodic lifelong RL. LiSP learns the skills in an unsupervised manner using intrinsic rewards and plan over the learned skills using a learned dynamics model. We demonstrate empirically that LiSP successfully enables long-horizon planning and learns agents that can avoid catastrophic failures even in challenging non-stationary and non-episodic environments.
arXiv Detail & Related papers (2020-12-07T09:33:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.