Related papers: Counterfactual experience augmented off-policy reinforcement learning

Counterfactual experience augmented off-policy reinforcement learning

URL: http://arxiv.org/abs/2503.13842v1
Date: Tue, 18 Mar 2025 02:32:50 GMT
Title: Counterfactual experience augmented off-policy reinforcement learning
Authors: Sunbowen Lee, Yicheng Gong, Chao Deng,
Abstract summary: CEA builds efficient inference model and enhances representativeness of learning data.<n>Uses variational autoencoders to model the dynamic patterns of state transitions.<n>Builds a complete counterfactual experience to alleviate the out-of-distribution problem of the learning data.
Score: 9.77739016575541
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Reinforcement learning control algorithms face significant challenges due to out-of-distribution and inefficient exploration problems. While model-based reinforcement learning enhances the agent's reasoning and planning capabilities by constructing virtual environments, training such virtual environments can be very complex. In order to build an efficient inference model and enhance the representativeness of learning data, we propose the Counterfactual Experience Augmentation (CEA) algorithm. CEA leverages variational autoencoders to model the dynamic patterns of state transitions and introduces randomness to model non-stationarity. This approach focuses on expanding the learning data in the experience pool through counterfactual inference and performs exceptionally well in environments that follow the bisimulation assumption. Environments with bisimulation properties are usually represented by discrete observation and action spaces, we propose a sampling method based on maximum kernel density estimation entropy to extend CEA to various environments. By providing reward signals for counterfactual state transitions based on real information, CEA constructs a complete counterfactual experience to alleviate the out-of-distribution problem of the learning data, and outperforms general SOTA algorithms in environments with difference properties. Finally, we discuss the similarities, differences and properties of generated counterfactual experiences and real experiences. The code is available at https://github.com/Aegis1863/CEA.

Related papers

AdvKT: An Adversarial Multi-Step Training Framework for Knowledge Tracing [64.79967583649407]
Knowledge Tracing (KT) monitors students' knowledge states and simulates their responses to question sequences. Existing KT models typically follow a single-step training paradigm, which leads to significant error accumulation. We propose a novel Adversarial Multi-Step Training Framework for Knowledge Tracing (AdvKT) which focuses on the multi-step KT task.
arXiv Detail & Related papers (2025-04-07T03:31:57Z)
Inverse Reinforcement Learning by Estimating Expertise of Demonstrators [15.662820454886205]
IRLEED, Inverse Reinforcement Learning by Estimating Expertise of Demonstrators, is a novel framework that overcomes hurdles without prior knowledge of demonstrator expertise.<n>IRLEED enhances existing Inverse Reinforcement Learning (IRL) algorithms by combining a general model for demonstrator suboptimality to address reward bias and action variance.<n> Experiments in both online and offline IL settings, with simulated and human-generated data, demonstrate IRLEED's adaptability and effectiveness.
arXiv Detail & Related papers (2024-02-02T20:21:09Z)
Conditional Kernel Imitation Learning for Continuous State Environments [9.750698192309978]
We introduce a novel conditional kernel density estimation-based imitation learning framework. We show consistently superior empirical performance over many state-of-the-art IL algorithms.
arXiv Detail & Related papers (2023-08-24T05:26:42Z)
iSAGE: An Incremental Version of SAGE for Online Explanation on Data Streams [8.49072000414555]
iSAGE is a time- and memory-efficient incrementalization of SAGE. We show that iSAGE adheres to similar theoretical properties as SAGE.
arXiv Detail & Related papers (2023-03-02T11:51:54Z)
Latent Variable Representation for Reinforcement Learning [131.03944557979725]
It remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of model-based reinforcement learning. We provide a representation view of the latent variable models for state-action value functions, which allows both tractable variational learning algorithm and effective implementation of the optimism/pessimism principle. In particular, we propose a computationally efficient planning algorithm with UCB exploration by incorporating kernel embeddings of latent variable models.
arXiv Detail & Related papers (2022-12-17T00:26:31Z)
Differentiable Agent-based Epidemiology [71.81552021144589]
We introduce GradABM: a scalable, differentiable design for agent-based modeling that is amenable to gradient-based learning with automatic differentiation. GradABM can quickly simulate million-size populations in few seconds on commodity hardware, integrate with deep neural networks and ingest heterogeneous data sources.
arXiv Detail & Related papers (2022-07-20T07:32:02Z)
Dream to Explore: Adaptive Simulations for Autonomous Systems [3.0664963196464448]
We tackle the problem of learning to control dynamical systems by applying Bayesian nonparametric methods. By employing Gaussian processes to discover latent world dynamics, we mitigate common data efficiency issues observed in reinforcement learning. Our algorithm jointly learns a world model and policy by optimizing a variational lower bound of a log-likelihood.
arXiv Detail & Related papers (2021-10-27T04:27:28Z)
Action-Sufficient State Representation Learning for Control with Structural Constraints [21.47086290736692]
In this paper, we focus on partially observable environments and propose to learn a minimal set of state representations that capture sufficient information for decision-making. We build a generative environment model for the structural relationships among variables in the system and present a principled way to characterize ASRs. Our empirical results on CarRacing and VizDoom demonstrate a clear advantage of learning and using ASRs for policy learning.
arXiv Detail & Related papers (2021-10-12T03:16:26Z)
Learning to Continuously Optimize Wireless Resource in a Dynamic Environment: A Bilevel Optimization Perspective [52.497514255040514]
This work develops a new approach that enables data-driven methods to continuously learn and optimize resource allocation strategies in a dynamic environment. We propose to build the notion of continual learning into wireless system design, so that the learning model can incrementally adapt to the new episodes. Our design is based on a novel bilevel optimization formulation which ensures certain fairness" across different data samples.
arXiv Detail & Related papers (2021-05-03T07:23:39Z)
DEALIO: Data-Efficient Adversarial Learning for Imitation from Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator. Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms. This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk. We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z)
Learning to Continuously Optimize Wireless Resource In Episodically Dynamic Environment [55.91291559442884]
This work develops a methodology that enables data-driven methods to continuously learn and optimize in a dynamic environment. We propose to build the notion of continual learning into the modeling process of learning wireless systems. Our design is based on a novel min-max formulation which ensures certain fairness" across different data samples.
arXiv Detail & Related papers (2020-11-16T08:24:34Z)
Variational Dynamic for Self-Supervised Exploration in Deep Reinforcement Learning [12.76337275628074]
In this work, we propose a variational dynamic model based on the conditional variational inference to model the multimodality andgenerativeity. We derive an upper bound of the negative log-likelihood of the environmental transition and use such an upper bound as the intrinsic reward for exploration. Our method outperforms several state-of-the-art environment model-based exploration approaches.
arXiv Detail & Related papers (2020-10-17T09:54:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.