Related papers: Discount Factor as a Regularizer in Reinforcement Learning

Discount Factor as a Regularizer in Reinforcement Learning

URL: http://arxiv.org/abs/2007.02040v1
Date: Sat, 4 Jul 2020 08:10:09 GMT
Title: Discount Factor as a Regularizer in Reinforcement Learning
Authors: Ron Amit, Ron Meir, Kamil Ciosek
Abstract summary: It is known that applying RL algorithms with a lower discount factor can act as a regularizer, improving performance in the limited data regime. We show an explicit equivalence between using a reduced discount factor and adding an explicit regularization term to the algorithm's loss. Motivated by the equivalence, we empirically study this technique compared to standard $L$ regularization.
Score: 23.56942940879309
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Specifying a Reinforcement Learning (RL) task involves choosing a suitable planning horizon, which is typically modeled by a discount factor. It is known that applying RL algorithms with a lower discount factor can act as a regularizer, improving performance in the limited data regime. Yet the exact nature of this regularizer has not been investigated. In this work, we fill in this gap. For several Temporal-Difference (TD) learning methods, we show an explicit equivalence between using a reduced discount factor and adding an explicit regularization term to the algorithm's loss. Motivated by the equivalence, we empirically study this technique compared to standard $L_2$ regularization by extensive experiments in discrete and continuous domains, using tabular and functional representations. Our experiments suggest the regularization effectiveness is strongly related to properties of the available data, such as size, distribution, and mixing rate.

Related papers

Causality-aligned Prompt Learning via Diffusion-based Counterfactual Generation [45.395353088233556]
We introduce a theoretically grounded $textbfDi$ffusion-based $textbfC$ounterf$textbfa$ctual $textbfp$rompt learning framework.<n>Our method performs excellently across tasks such as image classification, image-text retrieval, and visual question answering, with particularly strong advantages in unseen categories.
arXiv Detail & Related papers (2025-07-26T09:27:52Z)
Efficient Differentiable Approximation of Generalized Low-rank Regularization [64.73416824444328]
Low-rank regularization (LRR) has been widely applied in various machine learning tasks.<n>In this paper, we propose an efficient differentiable approximation of LRR.
arXiv Detail & Related papers (2025-05-21T11:49:17Z)
Policy Gradient with Active Importance Sampling [55.112959067035916]
Policy gradient (PG) methods significantly benefit from IS, enabling the effective reuse of previously collected samples. However, IS is employed in RL as a passive tool for re-weighting historical samples. We look for the best behavioral policy from which to collect samples to reduce the policy gradient variance.
arXiv Detail & Related papers (2024-05-09T09:08:09Z)
Estimating the Hessian Matrix of Ranking Objectives for Stochastic Learning to Rank with Gradient Boosted Trees [63.18324983384337]
We introduce the first learning to rank method for Gradient Boosted Decision Trees (GBDTs) Our main contribution is a novel estimator for the second-order derivatives, i.e., the Hessian matrix. We incorporate our estimator into the existing PL-Rank framework, which was originally designed for first-order derivatives only.
arXiv Detail & Related papers (2024-04-18T13:53:32Z)
Mitigating Covariate Shift in Misspecified Regression with Applications to Reinforcement Learning [39.02112341007981]
We study the effect of distribution shift in the presence of model misspecification. We show that empirical risk minimization, or standard least squares regression, can result in undesirable misspecification amplification. We develop a new algorithm that avoids this undesirable behavior, resulting in no misspecification amplification while still obtaining optimal statistical rates.
arXiv Detail & Related papers (2024-01-22T18:59:12Z)
Learning Repeatable Speech Embeddings Using An Intra-class Correlation Regularizer [16.716653844774374]
We evaluate the repeatability of embeddings using the intra-class correlation coefficient (ICC) We propose a novel regularizer, the ICC regularizer, as a complementary component for contrastive losses to guide deep neural networks to produce embeddings with higher repeatability. We implement the ICC regularizer and apply it to three speech tasks: speaker verification, voice style conversion, and a clinical application for detecting dysphonic voice.
arXiv Detail & Related papers (2023-10-25T23:21:46Z)
Provably Efficient Learning in Partially Observable Contextual Bandit [4.910658441596583]
We show how causal bounds can be applied to improving classical bandit algorithms. This research has the potential to enhance the performance of contextual bandit agents in real-world applications.
arXiv Detail & Related papers (2023-08-07T13:24:50Z)
Langevin Thompson Sampling with Logarithmic Communication: Bandits and Reinforcement Learning [34.4255062106615]
Thompson sampling (TS) is widely used in sequential decision making due to its ease of use and appealing empirical performance. We propose batched $textitLangevin Thompson Sampling$ algorithms that leverage MCMC methods to sample from approximate posteriors with only logarithmic communication costs in terms of batches. Our algorithms are computationally efficient and maintain the same order-optimal regret guarantees of $mathcalO(log T)$ for MABs, and $mathcalO(sqrtT)$ for RL.
arXiv Detail & Related papers (2023-06-15T01:16:29Z)
Supervised Contrastive Learning with Heterogeneous Similarity for Distribution Shifts [3.7819322027528113]
We propose a new regularization using the supervised contrastive learning to prevent such overfitting and to train models that do not degrade their performance under the distribution shifts. Experiments on benchmark datasets that emulate distribution shifts, including subpopulation shift and domain generalization, demonstrate the advantage of the proposed method.
arXiv Detail & Related papers (2023-04-07T01:45:09Z)
Anti-Exploration by Random Network Distillation [63.04360288089277]
We show that a naive choice of conditioning for the Random Network Distillation (RND) is not discriminative enough to be used as an uncertainty estimator. We show that this limitation can be avoided with conditioning based on Feature-wise Linear Modulation (FiLM) We evaluate it on the D4RL benchmark, showing that it is capable of achieving performance comparable to ensemble-based methods and outperforming ensemble-free approaches by a wide margin.
arXiv Detail & Related papers (2023-01-31T13:18:33Z)
Iterative regularization in classification via hinge loss diagonal descent [12.684351703991965]
Iterative regularization is a classic idea in regularization theory, that has recently become popular in machine learning. In this paper, we focus on iterative regularization in the context of classification.
arXiv Detail & Related papers (2022-12-24T07:15:26Z)
Revisiting Consistency Regularization for Semi-Supervised Learning [80.28461584135967]
We propose an improved consistency regularization framework by a simple yet effective technique, FeatDistLoss. Experimental results show that our model defines a new state of the art for various datasets and settings.
arXiv Detail & Related papers (2021-12-10T20:46:13Z)
False Correlation Reduction for Offline Reinforcement Learning [115.11954432080749]
We propose falSe COrrelation REduction (SCORE) for offline RL, a practically effective and theoretically provable algorithm. We empirically show that SCORE achieves the SoTA performance with 3.1x acceleration on various tasks in a standard benchmark (D4RL)
arXiv Detail & Related papers (2021-10-24T15:34:03Z)
Taylor Expansion of Discount Factors [56.46324239692532]
In practical reinforcement learning (RL), the discount factor used for estimating value functions often differs from that used for defining the evaluation objective. In this work, we study the effect that this discrepancy of discount factors has during learning, and discover a family of objectives that interpolate value functions of two distinct discount factors.
arXiv Detail & Related papers (2021-06-11T05:02:17Z)
Density Fixing: Simple yet Effective Regularization Method based on the Class Prior [2.3859169601259347]
We propose a framework of regularization methods, called density-fixing, that can be used commonly for supervised and semi-supervised learning. Our proposed regularization method improves the generalization performance by forcing the model to approximate the class's prior distribution or the frequency of occurrence.
arXiv Detail & Related papers (2020-07-08T04:58:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.