Related papers: Conjugated Discrete Distributions for Distributional Reinforcement Learning

Conjugated Discrete Distributions for Distributional Reinforcement Learning

URL: http://arxiv.org/abs/2112.07424v1
Date: Tue, 14 Dec 2021 14:14:49 GMT
Title: Conjugated Discrete Distributions for Distributional Reinforcement Learning
Authors: Bj\"orn Lindenberg, Jonas Nordqvist, Karl-Olof Lindahl
Abstract summary: We show that one of the most successful methods may not yield an optimal policy if we have a non-deterministic process. We argue that distributional reinforcement learning lends itself to remedy this situation completely.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: In this work we continue to build upon recent advances in reinforcement learning for finite Markov processes. A common approach among previous existing algorithms, both single-actor and distributed, is to either clip rewards or to apply a transformation method on Q-functions to handle a large variety of magnitudes in real discounted returns. We theoretically show that one of the most successful methods may not yield an optimal policy if we have a non-deterministic process. As a solution, we argue that distributional reinforcement learning lends itself to remedy this situation completely. By the introduction of a conjugated distributional operator we may handle a large class of transformations for real returns with guaranteed theoretical convergence. We propose an approximating single-actor algorithm based on this operator that trains agents directly on unaltered rewards using a proper distributional metric given by the Cram\'er distance. To evaluate its performance in a stochastic setting we train agents on a suite of 55 Atari 2600 games using sticky-actions and obtain state-of-the-art performance compared to other well-known algorithms in the Dopamine framework.

Related papers

MAP Estimation with Denoisers: Convergence Rates and Guarantees [37.88502562012743]
We show that a simple algorithm converges to the proximal operator under a log-concavity assumption on the prior $p$.<n>We show that this algorithm can be interpreted as a gradient descent on smoothed proximal objectives.
arXiv Detail & Related papers (2025-07-21T08:59:33Z)
A Differential Perspective on Distributional Reinforcement Learning [7.028778922533688]
We extend distributional reinforcement learning to the average-reward setting, where an agent aims to optimize the reward received per time-step.<n>In particular, we utilize a quantile-based approach to develop the first set of algorithms that can successfully learn and/or optimize the long-run per-step reward distribution.
arXiv Detail & Related papers (2025-06-03T19:26:25Z)
Train with Perturbation, Infer after Merging: A Two-Stage Framework for Continual Learning [59.6658995479243]
We propose texttext-Perturb-and-Merge (P&M), a novel continual learning framework that integrates model merging into the CL paradigm to avoid forgetting.<n>Through theoretical analysis, we minimize the total loss increase across all tasks and derive an analytical solution for the optimal merging coefficient.<n>Our proposed approach achieves state-of-the-art performance on several continual learning benchmark datasets.
arXiv Detail & Related papers (2025-05-28T14:14:19Z)
Non-Adversarial Inverse Reinforcement Learning via Successor Feature Matching [23.600285251963395]
In inverse reinforcement learning (IRL), an agent seeks to replicate expert demonstrations through interactions with the environment. Traditionally, IRL is treated as an adversarial game, where an adversary searches over reward models, and a learner optimize the reward through repeated RL procedures. We propose a novel approach to IRL by direct policy optimization, exploiting a linear factorization of the return as the inner product of successor features and a reward vector.
arXiv Detail & Related papers (2024-11-11T14:05:50Z)
From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process. We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z)
Distributed Markov Chain Monte Carlo Sampling based on the Alternating Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers. We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art. In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z)
Distributional Reinforcement Learning with Dual Expectile-Quantile Regression [51.87411935256015]
quantile regression approach to distributional RL provides flexible and effective way of learning arbitrary return distributions.<n>We show that distributional estimation guarantees vanish, and we empirically observe that the estimated distribution rapidly collapses to its mean.<n>Motivated by the efficiency of $L$-based learning, we propose to jointly learn expectiles and quantiles of the return distribution in a way that allows efficient learning.
arXiv Detail & Related papers (2023-05-26T12:30:05Z)
Stochastic Unrolled Federated Learning [85.6993263983062]
We introduce UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning. Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolleds and the decentralized nature of federated learning.
arXiv Detail & Related papers (2023-05-24T17:26:22Z)
Federated Learning for Heterogeneous Bandits with Unobserved Contexts [0.0]
We study the problem of federated multi-arm contextual bandits with unknown contexts. We propose an elimination-based algorithm and prove the regret bound for linearly parametrized reward functions.
arXiv Detail & Related papers (2023-03-29T22:06:24Z)
Cooperative Distribution Alignment via JSD Upper Bound [7.071749623370137]
Unsupervised distribution alignment estimates a transformation that maps two or more source distributions to a shared aligned distribution. This task has many applications including generative modeling, unsupervised domain adaptation, and socially aware learning. We propose to unify and generalize previous flow-based approaches under a single non-adversarial framework.
arXiv Detail & Related papers (2022-07-05T20:09:03Z)
Dense Unsupervised Learning for Video Segmentation [49.46930315961636]
We present a novel approach to unsupervised learning for video object segmentation (VOS) Unlike previous work, our formulation allows to learn dense feature representations directly in a fully convolutional regime. Our approach exceeds the segmentation accuracy of previous work despite using significantly less training data and compute power.
arXiv Detail & Related papers (2021-11-11T15:15:11Z)
Distributional Reinforcement Learning via Moment Matching [54.16108052278444]
We formulate a method that learns a finite set of statistics from each return distribution via neural networks. Our method can be interpreted as implicitly matching all orders of moments between a return distribution and its Bellman target. Experiments on the suite of Atari games show that our method outperforms the standard distributional RL baselines.
arXiv Detail & Related papers (2020-07-24T05:18:17Z)
Beyond the Mean-Field: Structured Deep Gaussian Processes Improve the Predictive Uncertainties [12.068153197381575]
We propose a novel variational family that allows for retaining covariances between latent processes while achieving fast convergence. We provide an efficient implementation of our new approach and apply it to several benchmark datasets. It yields excellent results and strikes a better balance between accuracy and calibrated uncertainty estimates than its state-of-the-art alternatives.
arXiv Detail & Related papers (2020-05-22T11:10:59Z)
A Distributional Analysis of Sampling-Based Reinforcement Learning Algorithms [67.67377846416106]
We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes. We show that value-based methods such as TD($lambda$) and $Q$-Learning have update rules which are contractive in the space of distributions of functions.
arXiv Detail & Related papers (2020-03-27T05:13:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.