Related papers: Active Reinforcement Learning Strategies for Offline Policy Improvement

Active Reinforcement Learning Strategies for Offline Policy Improvement

URL: http://arxiv.org/abs/2412.13106v2
Date: Thu, 26 Dec 2024 10:15:54 GMT
Title: Active Reinforcement Learning Strategies for Offline Policy Improvement
Authors: Ambedkar Dukkipati, Ranga Shaarad Ayyagari, Bodhisattwa Dasgupta, Parag Dutta, Prabhas Reddy Onteru,
Abstract summary: We propose an active reinforcement learning method capable of collecting trajectories that can augment existing offline data.<n>We demonstrate that our proposed method reduces additional online interaction with the environment by up to 75% over competitive baselines.
Score: 8.2883946876766
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Learning agents that excel at sequential decision-making tasks must continuously resolve the problem of exploration and exploitation for optimal learning. However, such interactions with the environment online might be prohibitively expensive and may involve some constraints, such as a limited budget for agent-environment interactions and restricted exploration in certain regions of the state space. Examples include selecting candidates for medical trials and training agents in complex navigation environments. This problem necessitates the study of active reinforcement learning strategies that collect minimal additional experience trajectories by reusing existing offline data previously collected by some unknown behavior policy. In this work, we propose an active reinforcement learning method capable of collecting trajectories that can augment existing offline data. With extensive experimentation, we demonstrate that our proposed method reduces additional online interaction with the environment by up to 75% over competitive baselines across various continuous control environments such as Gym-MuJoCo locomotion environments as well as Maze2d, AntMaze, CARLA and IsaacSimGo1. To the best of our knowledge, this is the first work that addresses the active learning problem in the context of sequential decision-making and reinforcement learning.

Related papers

Training a Generally Curious Agent [86.84089201249104]
Paprika is a fine-tuning approach that enables language models to develop general decision-making capabilities.<n>Paprika teaches models to explore and adapt their behavior on a new task based on environment feedback in-context without more gradient updates.<n>Results suggest a promising path towards AI systems that can autonomously solve sequential decision-making problems.
arXiv Detail & Related papers (2025-02-24T18:56:58Z)
Hierarchical Subspaces of Policies for Continual Offline Reinforcement Learning [19.463863037999054]
We consider a Continual Reinforcement Learning setup, where a learning agent must continuously adapt to new tasks while retaining previously acquired skill sets. We introduce HiSPO, a novel hierarchical framework designed specifically for continual learning in navigation settings from offline data. We demonstrate, through a careful experimental study, the effectiveness of our method in both classical MuJoCo maze environments and complex video game-like navigation simulations.
arXiv Detail & Related papers (2024-12-19T14:00:03Z)
Temporal Abstraction in Reinforcement Learning with Offline Data [8.370420807869321]
We propose a framework by which an online hierarchical reinforcement learning algorithm can be trained on an offline dataset of transitions collected by an unknown behavior policy. We validate our method on Gym MuJoCo environments and robotic gripper block-stacking tasks in the standard as well as transfer and goal-conditioned settings.
arXiv Detail & Related papers (2024-07-21T18:10:31Z)
Aquatic Navigation: A Challenging Benchmark for Deep Reinforcement Learning [53.3760591018817]
We propose a new benchmarking environment for aquatic navigation using recent advances in the integration between game engines and Deep Reinforcement Learning. Specifically, we focus on PPO, one of the most widely accepted algorithms, and we propose advanced training techniques. Our empirical evaluation shows that a well-designed combination of these ingredients can achieve promising results.
arXiv Detail & Related papers (2024-05-30T23:20:23Z)
Interactive Graph Convolutional Filtering [79.34979767405979]
Interactive Recommender Systems (IRS) have been increasingly used in various domains, including personalized article recommendation, social media, and online advertising. These problems are exacerbated by the cold start problem and data sparsity problem. Existing Multi-Armed Bandit methods, despite their carefully designed exploration strategies, often struggle to provide satisfactory results in the early stages. Our proposed method extends interactive collaborative filtering into the graph model to enhance the performance of collaborative filtering between users and items.
arXiv Detail & Related papers (2023-09-04T09:02:31Z)
Safe Multi-agent Learning via Trapping Regions [89.24858306636816]
We apply the concept of trapping regions, known from qualitative theory of dynamical systems, to create safety sets in the joint strategy space for decentralized learning. We propose a binary partitioning algorithm for verification that candidate sets form trapping regions in systems with known learning dynamics, and a sampling algorithm for scenarios where learning dynamics are not known.
arXiv Detail & Related papers (2023-02-27T14:47:52Z)
Let Offline RL Flow: Training Conservative Agents in the Latent Space of Normalizing Flows [58.762959061522736]
offline reinforcement learning aims to train a policy on a pre-recorded and fixed dataset without any additional environment interactions. We build upon recent works on learning policies in latent action spaces and use a special form of Normalizing Flows for constructing a generative model. We evaluate our method on various locomotion and navigation tasks, demonstrating that our approach outperforms recently proposed algorithms.
arXiv Detail & Related papers (2022-11-20T21:57:10Z)
Offline Reinforcement Learning with Instrumental Variables in Confounded Markov Decision Processes [93.61202366677526]
We study the offline reinforcement learning (RL) in the face of unmeasured confounders. We propose various policy learning methods with the finite-sample suboptimality guarantee of finding the optimal in-class policy.
arXiv Detail & Related papers (2022-09-18T22:03:55Z)
Skill-based Meta-Reinforcement Learning [65.31995608339962]
We devise a method that enables meta-learning on long-horizon, sparse-reward tasks. Our core idea is to leverage prior experience extracted from offline datasets during meta-learning.
arXiv Detail & Related papers (2022-04-25T17:58:19Z)
Latent-Variable Advantage-Weighted Policy Optimization for Offline RL [70.01851346635637]
offline reinforcement learning methods hold the promise of learning policies from pre-collected datasets without the need to query the environment for new transitions. In practice, offline datasets are often heterogeneous, i.e., collected in a variety of scenarios. We propose to leverage latent-variable policies that can represent a broader class of policy distributions. Our method improves the average performance of the next best-performing offline reinforcement learning methods by 49% on heterogeneous datasets.
arXiv Detail & Related papers (2022-03-16T21:17:03Z)
Conditional Imitation Learning for Multi-Agent Games [89.897635970366]
We study the problem of conditional multi-agent imitation learning, where we have access to joint trajectory demonstrations at training time. We propose a novel approach to address the difficulties of scalability and data scarcity. Our model learns a low-rank subspace over ego and partner agent strategies, then infers and adapts to a new partner strategy by interpolating in the subspace.
arXiv Detail & Related papers (2022-01-05T04:40:13Z)
Unsupervised Reinforcement Learning in Multiple Environments [37.5349071806395]
We address the problem of unsupervised reinforcement learning in a class of multiple environments. We present a policy gradient algorithm, $alpha$MEPOL, to optimize the introduced objective through mediated interactions with the class. We show that reinforcement learning greatly benefits from the pre-trained exploration strategy.
arXiv Detail & Related papers (2021-12-16T09:54:37Z)
Online learning with dynamics: A minimax perspective [25.427783092065546]
We study the problem of online learning with dynamics, where a learner interacts with a stateful environment over multiple rounds. Our main results provide sufficient conditions for online learnability for this setup with corresponding rates.
arXiv Detail & Related papers (2020-12-03T05:06:08Z)
PLAS: Latent Action Space for Offline Reinforcement Learning [18.63424441772675]
The goal of offline reinforcement learning is to learn a policy from a fixed dataset, without further interactions with the environment. Existing off-policy algorithms have limited performance on static datasets due to extrapolation errors from out-of-distribution actions. We demonstrate that our method provides competitive performance consistently across various continuous control tasks and different types of datasets.
arXiv Detail & Related papers (2020-11-14T03:38:38Z)
Offline Learning for Planning: A Summary [0.0]
Training of autonomous agents often requires expensive and unsafe trial-and-error interactions with the environment. Data sets containing recorded experiences of intelligent agents performing various tasks are accessible on the internet. In this paper we adumbrate the ideas motivating the development of the state-of-the-art offline learning baselines.
arXiv Detail & Related papers (2020-10-05T11:41:11Z)
Deep Reinforcement Learning amidst Lifelong Non-Stationarity [67.24635298387624]
We show that an off-policy RL algorithm can reason about and tackle lifelong non-stationarity. Our method leverages latent variable models to learn a representation of the environment from current and past experiences. We also introduce several simulation environments that exhibit lifelong non-stationarity, and empirically find that our approach substantially outperforms approaches that do not reason about environment shift.
arXiv Detail & Related papers (2020-06-18T17:34:50Z)
Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey [53.73359052511171]
Reinforcement learning (RL) is a popular paradigm for addressing sequential decision tasks in which the agent has only limited environmental feedback. We present a framework for curriculum learning (CL) in RL, and use it to survey and classify existing CL methods in terms of their assumptions, capabilities, and goals.
arXiv Detail & Related papers (2020-03-10T20:41:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.