Related papers: Data Augmentation through Expert-guided Symmetry Detection to Improve Performance in Offline Reinforcement Learning

Data Augmentation through Expert-guided Symmetry Detection to Improve Performance in Offline Reinforcement Learning

URL: http://arxiv.org/abs/2112.09943v3
Date: Wed, 12 Apr 2023 14:38:58 GMT
Title: Data Augmentation through Expert-guided Symmetry Detection to Improve Performance in Offline Reinforcement Learning
Authors: Giorgio Angelotti, Nicolas Drougard, Caroline P. C. Chanel
Abstract summary: offline estimation of the dynamical model of a Markov Decision Process (MDP) is a non-trivial task. Recent works showed that an expert-guided pipeline relying on Density Estimation methods effectively detects this structure in deterministic environments. We show that the former results lead to a performance improvement when solving the learned MDP and then applying the optimized policy in the real environment.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Offline estimation of the dynamical model of a Markov Decision Process (MDP) is a non-trivial task that greatly depends on the data available in the learning phase. Sometimes the dynamics of the model is invariant with respect to some transformations of the current state and action. Recent works showed that an expert-guided pipeline relying on Density Estimation methods as Deep Neural Network based Normalizing Flows effectively detects this structure in deterministic environments, both categorical and continuous-valued. The acquired knowledge can be exploited to augment the original data set, leading eventually to a reduction in the distributional shift between the true and the learned model. Such data augmentation technique can be exploited as a preliminary process to be executed before adopting an Offline Reinforcement Learning architecture, increasing its performance. In this work we extend the paradigm to also tackle non-deterministic MDPs, in particular, 1) we propose a detection threshold in categorical environments based on statistical distances, and 2) we show that the former results lead to a performance improvement when solving the learned MDP and then applying the optimized policy in the real environment.

Related papers

Dynamic Loss-Based Sample Reweighting for Improved Large Language Model Pretraining [55.262510814326035]
Existing reweighting strategies primarily focus on group-level data importance. We introduce novel algorithms for dynamic, instance-level data reweighting. Our framework allows us to devise reweighting strategies deprioritizing redundant or uninformative data.
arXiv Detail & Related papers (2025-02-10T17:57:15Z)
Adaptive Anomaly Detection in Network Flows with Low-Rank Tensor Decompositions and Deep Unrolling [9.20186865054847]
Anomaly detection (AD) is increasingly recognized as a key component for ensuring the resilience of future communication systems. This work considers AD in network flows using incomplete measurements. We propose a novel block-successive convex approximation algorithm based on a regularized model-fitting objective. Inspired by Bayesian approaches, we extend the model architecture to perform online adaptation to per-flow and per-time-step statistics.
arXiv Detail & Related papers (2024-09-17T19:59:57Z)
Effective Restoration of Source Knowledge in Continual Test Time Adaptation [44.17577480511772]
This paper introduces an unsupervised domain change detection method that is capable of identifying domain shifts in dynamic environments. By restoring the knowledge from the source, it effectively corrects the negative consequences arising from the gradual deterioration of model parameters. We perform extensive experiments on benchmark datasets to demonstrate the superior performance of our method compared to state-of-the-art adaptation methods.
arXiv Detail & Related papers (2023-11-08T19:21:48Z)
Statistically Efficient Variance Reduction with Double Policy Estimation for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning [53.97273491846883]
We propose DPE: an RL algorithm that blends offline sequence modeling and offline reinforcement learning with Double Policy Estimation. We validate our method in multiple tasks of OpenAI Gym with D4RL benchmarks.
arXiv Detail & Related papers (2023-08-28T20:46:07Z)
End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures. We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z)
Variational Latent Branching Model for Off-Policy Evaluation [23.073461349048834]
We propose a variational latent branching model (VLBM) to learn the transition function of Markov decision processes (MDPs) We introduce the branching architecture to improve the model's robustness against randomly model weights. We show that the VLBM outperforms existing state-of-the-art OPE methods in general.
arXiv Detail & Related papers (2023-01-28T02:20:03Z)
Latent Variable Representation for Reinforcement Learning [131.03944557979725]
It remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of model-based reinforcement learning. We provide a representation view of the latent variable models for state-action value functions, which allows both tractable variational learning algorithm and effective implementation of the optimism/pessimism principle. In particular, we propose a computationally efficient planning algorithm with UCB exploration by incorporating kernel embeddings of latent variable models.
arXiv Detail & Related papers (2022-12-17T00:26:31Z)
MARS: Meta-Learning as Score Matching in the Function Space [79.73213540203389]
We present a novel approach to extracting inductive biases from a set of related datasets. We use functional Bayesian neural network inference, which views the prior as a process and performs inference in the function space. Our approach can seamlessly acquire and represent complex prior knowledge by metalearning the score function of the data-generating process.
arXiv Detail & Related papers (2022-10-24T15:14:26Z)
Invariance Learning in Deep Neural Networks with Differentiable Laplace Approximations [76.82124752950148]
We develop a convenient gradient-based method for selecting the data augmentation. We use a differentiable Kronecker-factored Laplace approximation to the marginal likelihood as our objective.
arXiv Detail & Related papers (2022-02-22T02:51:11Z)
Expert-Guided Symmetry Detection in Markov Decision Processes [0.0]
We propose a paradigm that aims to detect the presence of some transformations of the state-action space for which the MDP dynamics is invariant. The results show that the model distributional shift is reduced when the dataset is augmented with the data obtained by using the detected symmetries.
arXiv Detail & Related papers (2021-11-19T16:12:30Z)
Learning Neural Models for Natural Language Processing in the Face of Distributional Shift [10.990447273771592]
The dominating NLP paradigm of training a strong neural predictor to perform one task on a specific dataset has led to state-of-the-art performance in a variety of applications. It builds upon the assumption that the data distribution is stationary, ie. that the data is sampled from a fixed distribution both at training and test time. This way of training is inconsistent with how we as humans are able to learn from and operate within a constantly changing stream of information. It is ill-adapted to real-world use cases where the data distribution is expected to shift over the course of a model's lifetime
arXiv Detail & Related papers (2021-09-03T14:29:20Z)
Learning to Continuously Optimize Wireless Resource in a Dynamic Environment: A Bilevel Optimization Perspective [52.497514255040514]
This work develops a new approach that enables data-driven methods to continuously learn and optimize resource allocation strategies in a dynamic environment. We propose to build the notion of continual learning into wireless system design, so that the learning model can incrementally adapt to the new episodes. Our design is based on a novel bilevel optimization formulation which ensures certain fairness" across different data samples.
arXiv Detail & Related papers (2021-05-03T07:23:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.