A Differential Perspective on Distributional Reinforcement Learning
- URL: http://arxiv.org/abs/2506.03333v1
- Date: Tue, 03 Jun 2025 19:26:25 GMT
- Title: A Differential Perspective on Distributional Reinforcement Learning
- Authors: Juan Sebastian Rojas, Chi-Guhn Lee,
- Abstract summary: We extend distributional reinforcement learning to the average-reward setting, where an agent aims to optimize the reward received per time-step.<n>In particular, we utilize a quantile-based approach to develop the first set of algorithms that can successfully learn and/or optimize the long-run per-step reward distribution.
- Score: 7.028778922533688
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: To date, distributional reinforcement learning (distributional RL) methods have exclusively focused on the discounted setting, where an agent aims to optimize a potentially-discounted sum of rewards over time. In this work, we extend distributional RL to the average-reward setting, where an agent aims to optimize the reward received per time-step. In particular, we utilize a quantile-based approach to develop the first set of algorithms that can successfully learn and/or optimize the long-run per-step reward distribution, as well as the differential return distribution of an average-reward MDP. We derive proven-convergent tabular algorithms for both prediction and control, as well as a broader family of algorithms that have appealing scaling properties. Empirically, we find that these algorithms consistently yield competitive performance when compared to their non-distributional equivalents, while also capturing rich information about the long-run reward and return distributions.
Related papers
- Optimizing Return Distributions with Distributional Dynamic Programming [38.11199286025947]
We introduce distributional dynamic programming (DP) methods for optimizing statistical functionals of the return distribution.<n>To go beyond expected utilities, we combine distributional DP with stock augmentation, a technique previously introduced for classic DP in the context of risk-sensitive RL.<n>We describe a number of applications outlining how to use distributional DP to solve different stock-augmented return distribution optimization problems.
arXiv Detail & Related papers (2025-01-22T17:20:43Z) - Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer [52.09480867526656]
We identify the source of misalignment as a form of distributional shift and uncertainty in learning human preferences.<n>To mitigate overoptimization, we first propose a theoretical algorithm that chooses the best policy for an adversarially chosen reward model.<n>Using the equivalence between reward models and the corresponding optimal policy, the algorithm features a simple objective that combines a preference optimization loss and a supervised learning loss.
arXiv Detail & Related papers (2024-05-26T05:38:50Z) - Distributed Markov Chain Monte Carlo Sampling based on the Alternating
Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers.
We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art.
In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z) - Reinforcement learning with non-ergodic reward increments: robustness via ergodicity transformations [8.44491527275706]
Application areas for reinforcement learning include autonomous driving, precision agriculture, and finance.<n>In particular, the focus of RL is typically on the expected value of the return.<n>We develop an algorithm that lets RL agents optimize the long-term performance of individual trajectories.
arXiv Detail & Related papers (2023-10-17T15:13:33Z) - Stochastic Unrolled Federated Learning [85.6993263983062]
We introduce UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning.
Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolleds and the decentralized nature of federated learning.
arXiv Detail & Related papers (2023-05-24T17:26:22Z) - Truncating Trajectories in Monte Carlo Reinforcement Learning [48.97155920826079]
In Reinforcement Learning (RL), an agent acts in an unknown environment to maximize the expected cumulative discounted sum of an external reward signal.
We propose an a-priori budget allocation strategy that leads to the collection of trajectories of different lengths.
We show that an appropriate truncation of the trajectories can succeed in improving performance.
arXiv Detail & Related papers (2023-05-07T19:41:57Z) - How Does Return Distribution in Distributional Reinforcement Learning Help Optimization? [10.149055921090572]
We investigate the optimization advantages of distributional RL within the Neural Fitted Z-Iteration(Neural FZI) framework.
We show that distributional RL has desirable smoothness characteristics and hence enjoys stable gradients.
Our research findings illuminate how the return distribution in distributional RL algorithms helps the optimization.
arXiv Detail & Related papers (2022-09-29T02:18:31Z) - Conjugated Discrete Distributions for Distributional Reinforcement
Learning [0.0]
We show that one of the most successful methods may not yield an optimal policy if we have a non-deterministic process.
We argue that distributional reinforcement learning lends itself to remedy this situation completely.
arXiv Detail & Related papers (2021-12-14T14:14:49Z) - Distributional Reinforcement Learning for Multi-Dimensional Reward
Functions [91.88969237680669]
We introduce Multi-Dimensional Distributional DQN (MD3QN) to model the joint return distribution from multiple reward sources.
As a by-product of joint distribution modeling, MD3QN can capture the randomness in returns for each source of reward.
In experiments, our method accurately models the joint return distribution in environments with richly correlated reward functions.
arXiv Detail & Related papers (2021-10-26T11:24:23Z) - The Benefits of Being Categorical Distributional: Uncertainty-aware Regularized Exploration in Reinforcement Learning [17.64056793687686]
We find potential superiority of distributional RL can be attributed to a derived distribution-matching entropy regularization.<n>Our study offers a new perspective from the exploration to explain the intrinsic benefits of adopting distributional learning in RL.
arXiv Detail & Related papers (2021-10-07T03:14:46Z) - Bayesian Distributional Policy Gradients [2.28438857884398]
Distributional Reinforcement Learning maintains the entire probability distribution of the reward-to-go, i.e. the return.
Bayesian Distributional Policy Gradients (BDPG) uses adversarial training in joint-contrastive learning to estimate a variational posterior from the returns.
arXiv Detail & Related papers (2021-03-20T23:42:50Z) - A Distributional Analysis of Sampling-Based Reinforcement Learning
Algorithms [67.67377846416106]
We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes.
We show that value-based methods such as TD($lambda$) and $Q$-Learning have update rules which are contractive in the space of distributions of functions.
arXiv Detail & Related papers (2020-03-27T05:13:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.