Stabilizing Q Learning Via Soft Mellowmax Operator
- URL: http://arxiv.org/abs/2012.09456v2
- Date: Fri, 18 Dec 2020 02:21:44 GMT
- Title: Stabilizing Q Learning Via Soft Mellowmax Operator
- Authors: Yaozhong Gan, Zhe Zhang, Xiaoyang Tan
- Abstract summary: Mellowmax is a proposed differentiable and non-expansion softmax operator that allows a convergent behavior in learning and planning.
We show that our SM2 operator can be applied to the challenging multi-agent reinforcement learning scenarios, leading to stable value function approximation and state of the art performance.
- Score: 12.208344427928466
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning complicated value functions in high dimensional state space by
function approximation is a challenging task, partially due to that the
max-operator used in temporal difference updates can theoretically cause
instability for most linear or non-linear approximation schemes. Mellowmax is a
recently proposed differentiable and non-expansion softmax operator that allows
a convergent behavior in learning and planning. Unfortunately, the performance
bound for the fixed point it converges to remains unclear, and in practice, its
parameter is sensitive to various domains and has to be tuned case by case.
Finally, the Mellowmax operator may suffer from oversmoothing as it ignores the
probability being taken for each action when aggregating them. In this paper,
we address all the above issues with an enhanced Mellowmax operator, named SM2
(Soft Mellowmax). Particularly, the proposed operator is reliable, easy to
implement, and has provable performance guarantee, while preserving all the
advantages of Mellowmax. Furthermore, we show that our SM2 operator can be
applied to the challenging multi-agent reinforcement learning scenarios,
leading to stable value function approximation and state of the art
performance.
Related papers
- MultiMax: Sparse and Multi-Modal Attention Learning [60.49318008131978]
SoftMax is a ubiquitous ingredient of modern machine learning algorithms.
We show that sparsity can be achieved by a family of SoftMax variants, but they often require an alternative loss function and do not preserve multi-modality.
We propose MultiMax, which adaptively modulates the output distribution according to input entry range.
arXiv Detail & Related papers (2024-06-03T10:51:43Z) - Revisiting Logistic-softmax Likelihood in Bayesian Meta-Learning for Few-Shot Classification [4.813254903898101]
logistic-softmax is often employed as an alternative to the softmax likelihood in multi-class Gaussian process classification.
We revisit and redesign the logistic-softmax likelihood, which enables control of the textita priori confidence level through a temperature parameter.
Our approach yields well-calibrated uncertainty estimates and achieves comparable or superior results on standard benchmark datasets.
arXiv Detail & Related papers (2023-10-16T13:20:13Z) - r-softmax: Generalized Softmax with Controllable Sparsity Rate [11.39524236962986]
We propose r-softmax, a modification of the softmax, outputting sparse probability distribution with controllable sparsity rate.
We show on several multi-label datasets that r-softmax outperforms other sparse alternatives to softmax and is highly competitive with the original softmax.
arXiv Detail & Related papers (2023-04-11T14:28:29Z) - Convex Bounds on the Softmax Function with Applications to Robustness
Verification [69.09991317119679]
The softmax function is a ubiquitous component at the output of neural networks and increasingly in intermediate layers as well.
This paper provides convex lower bounds and concave upper bounds on the softmax function, which are compatible with convex optimization formulations for characterizing neural networks and other ML models.
arXiv Detail & Related papers (2023-03-03T05:07:02Z) - Spectral Aware Softmax for Visible-Infrared Person Re-Identification [123.69049942659285]
Visible-infrared person re-identification (VI-ReID) aims to match specific pedestrian images from different modalities.
Existing methods still follow the softmax loss training paradigm, which is widely used in single-modality classification tasks.
We propose the spectral-aware softmax (SA-Softmax) loss, which can fully explore the embedding space with the modality information.
arXiv Detail & Related papers (2023-02-03T02:57:18Z) - Learning to Optimize with Stochastic Dominance Constraints [103.26714928625582]
In this paper, we develop a simple yet efficient approach for the problem of comparing uncertain quantities.
We recast inner optimization in the Lagrangian as a learning problem for surrogate approximation, which bypasses apparent intractability.
The proposed light-SD demonstrates superior performance on several representative problems ranging from finance to supply chain management.
arXiv Detail & Related papers (2022-11-14T21:54:31Z) - Minimax Optimization with Smooth Algorithmic Adversaries [59.47122537182611]
We propose a new algorithm for the min-player against smooth algorithms deployed by an adversary.
Our algorithm is guaranteed to make monotonic progress having no limit cycles, and to find an appropriate number of gradient ascents.
arXiv Detail & Related papers (2021-06-02T22:03:36Z) - Exploring Alternatives to Softmax Function [0.5924831288313849]
We investigate Taylor softmax, SM-softmax and our proposed SM-Taylor softmax as alternatives to softmax function.
Our experiments for the image classification task on different datasets reveal that there is always a configuration of the SM-Taylor softmax function that outperforms the normal softmax function.
arXiv Detail & Related papers (2020-11-23T16:50:18Z) - Optimal Approximation -- Smoothness Tradeoffs for Soft-Max Functions [73.33961743410876]
A soft-max function has two main efficiency measures: approximation and smoothness.
We identify the optimal approximation-smoothness tradeoffs for different measures of approximation and smoothness.
This leads to novel soft-max functions, each of which is optimal for a different application.
arXiv Detail & Related papers (2020-10-22T05:19:58Z) - Softmax Deep Double Deterministic Policy Gradients [37.23518654230526]
We propose to use the Boltzmann softmax operator for value function estimation in continuous control.
We also design two new algorithms, Softmax Deep Deterministic Policy Gradients (SD2) and Softmax Deep Double Deterministic Policy Gradients (SD3), by building the softmax operator upon single and double estimators.
arXiv Detail & Related papers (2020-10-19T02:52:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.