Maximum Entropy Reinforcement Learning with Mixture Policies
- URL: http://arxiv.org/abs/2103.10176v1
- Date: Thu, 18 Mar 2021 11:23:39 GMT
- Title: Maximum Entropy Reinforcement Learning with Mixture Policies
- Authors: Nir Baram, Guy Tennenholtz, Shie Mannor
- Abstract summary: We construct a tractable approximation of the mixture entropy using MaxEnt algorithms.
We show that it is closely related to the sum of marginal entropies.
We derive an algorithmic variant of Soft Actor-Critic (SAC) to the mixture policy case and evaluate it on a series of continuous control tasks.
- Score: 54.291331971813364
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mixture models are an expressive hypothesis class that can approximate a rich
set of policies. However, using mixture policies in the Maximum Entropy
(MaxEnt) framework is not straightforward. The entropy of a mixture model is
not equal to the sum of its components, nor does it have a closed-form
expression in most cases. Using such policies in MaxEnt algorithms, therefore,
requires constructing a tractable approximation of the mixture entropy. In this
paper, we derive a simple, low-variance mixture-entropy estimator. We show that
it is closely related to the sum of marginal entropies. Equipped with our
entropy estimator, we derive an algorithmic variant of Soft Actor-Critic (SAC)
to the mixture policy case and evaluate it on a series of continuous control
tasks.
Related papers
- Fast Semisupervised Unmixing Using Nonconvex Optimization [80.11512905623417]
We introduce a novel convex convex model for semi/library-based unmixing.
We demonstrate the efficacy of Alternating Methods of sparse unsupervised unmixing.
arXiv Detail & Related papers (2024-01-23T10:07:41Z) - Optimal Algorithms for Stochastic Complementary Composite Minimization [55.26935605535377]
Inspired by regularization techniques in statistics and machine learning, we study complementary composite minimization.
We provide novel excess risk bounds, both in expectation and with high probability.
Our algorithms are nearly optimal, which we prove via novel lower complexity bounds for this class of problems.
arXiv Detail & Related papers (2022-11-03T12:40:24Z) - Faster One-Sample Stochastic Conditional Gradient Method for Composite
Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms.
The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z) - Beyond Exact Gradients: Convergence of Stochastic Soft-Max Policy Gradient Methods with Entropy Regularization [20.651913793555163]
We revisit the classical entropy regularized policy gradient methods with the soft-max policy parametrization.
We establish a global optimality convergence result and a sample complexity of $widetildemathcalO(frac1epsilon2)$ for the proposed algorithm.
arXiv Detail & Related papers (2021-10-19T17:21:09Z) - Fitting large mixture models using stochastic component selection [0.0]
We propose a combination of the expectation of the computational and the Metropolis-Hastings algorithm to evaluate only a small number of components.
The Markov chain of component assignments is sequentially generated across the algorithm's iterations.
We put emphasis on generality of our method, equipping it with the ability to train both shallow and deep mixture models.
arXiv Detail & Related papers (2021-10-10T12:39:53Z) - Clustering a Mixture of Gaussians with Unknown Covariance [4.821312633849745]
We derive a Max-Cut integer program based on maximum likelihood estimation.
We develop an efficient spectral algorithm that attains the optimal rate but requires a quadratic sample size.
We generalize the Max-Cut program to a $k$-means program that handles multi-component mixtures with possibly unequal weights.
arXiv Detail & Related papers (2021-10-04T17:59:20Z) - Mean-Square Analysis with An Application to Optimal Dimension Dependence
of Langevin Monte Carlo [60.785586069299356]
This work provides a general framework for the non-asymotic analysis of sampling error in 2-Wasserstein distance.
Our theoretical analysis is further validated by numerical experiments.
arXiv Detail & Related papers (2021-09-08T18:00:05Z) - Spectral clustering under degree heterogeneity: a case for the random
walk Laplacian [83.79286663107845]
This paper shows that graph spectral embedding using the random walk Laplacian produces vector representations which are completely corrected for node degree.
In the special case of a degree-corrected block model, the embedding concentrates about K distinct points, representing communities.
arXiv Detail & Related papers (2021-05-03T16:36:27Z) - Rigid and Articulated Point Registration with Expectation Conditional
Maximization [20.096170794358315]
We introduce an innovative EM-like algorithm, namely the Conditional Expectation Maximization for Point Registration (ECMPR) algorithm.
We analyse in detail the associated consequences in terms of estimation of the registration parameters.
We extend rigid registration to articulated registration.
arXiv Detail & Related papers (2020-12-09T17:36:11Z) - Self-regularizing Property of Nonparametric Maximum Likelihood Estimator
in Mixture Models [39.27013036481509]
We introduce the nonparametric maximum likelihood (NPMLE) model for general Gaussian mixtures.
We show that with high probability the NPMLE based on a sample size has $O(log n)$ atoms (mass points)
Notably, any mixture is statistically in from a finite one with $Olog selection.
arXiv Detail & Related papers (2020-08-19T03:39:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.