Learning Unnormalized Statistical Models via Compositional Optimization
- URL: http://arxiv.org/abs/2306.07485v1
- Date: Tue, 13 Jun 2023 01:18:16 GMT
- Title: Learning Unnormalized Statistical Models via Compositional Optimization
- Authors: Wei Jiang, Jiayu Qin, Lingyu Wu, Changyou Chen, Tianbao Yang, Lijun
Zhang
- Abstract summary: Noise-contrastive estimation(NCE) has been proposed by formulating the objective as the logistic loss of the real data and the artificial noise.
In this paper, we study it a direct approach for optimizing the negative log-likelihood of unnormalized models.
- Score: 73.30514599338407
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning unnormalized statistical models (e.g., energy-based models) is
computationally challenging due to the complexity of handling the partition
function. To eschew this complexity, noise-contrastive estimation~(NCE) has
been proposed by formulating the objective as the logistic loss of the real
data and the artificial noise. However, as found in previous works, NCE may
perform poorly in many tasks due to its flat loss landscape and slow
convergence. In this paper, we study it a direct approach for optimizing the
negative log-likelihood of unnormalized models from the perspective of
compositional optimization. To tackle the partition function, a noise
distribution is introduced such that the log partition function can be written
as a compositional function whose inner function can be estimated with
stochastic samples. Hence, the objective can be optimized by stochastic
compositional optimization algorithms. Despite being a simple method, we
demonstrate that it is more favorable than NCE by (1) establishing a fast
convergence rate and quantifying its dependence on the noise distribution
through the variance of stochastic estimators; (2) developing better results
for one-dimensional Gaussian mean estimation by showing our objective has a
much favorable loss landscape and hence our method enjoys faster convergence;
(3) demonstrating better performance on multiple applications, including
density estimation, out-of-distribution detection, and real image generation.
Related papers
- On diffusion-based generative models and their error bounds: The log-concave case with full convergence estimates [5.13323375365494]
We provide theoretical guarantees for the convergence behaviour of diffusion-based generative models under the assumption of strongly log-concave data distributions.
We demonstrate via a motivating example, sampling from a Gaussian distribution with unknown mean, the powerfulness of our approach.
This approach yields the best known convergence rate for our sampling algorithm.
arXiv Detail & Related papers (2023-11-22T18:40:45Z) - Efficient Model-Free Exploration in Low-Rank MDPs [76.87340323826945]
Low-Rank Markov Decision Processes offer a simple, yet expressive framework for RL with function approximation.
Existing algorithms are either (1) computationally intractable, or (2) reliant upon restrictive statistical assumptions.
We propose the first provably sample-efficient algorithm for exploration in Low-Rank MDPs.
arXiv Detail & Related papers (2023-07-08T15:41:48Z) - Provable benefits of score matching [30.317535687908755]
We give the first example of a natural exponential family of distributions such that score matching loss is computationally efficient to optimize.
We show that designing a zeroth-order or first-order oracle for optimizing the likelihood loss is NP-hard.
Minimizing the score matching loss is both computationally and statistically efficient, with complexity in the ambient dimension.
arXiv Detail & Related papers (2023-06-03T03:42:30Z) - Efficient distributed representations beyond negative sampling [4.5687771576879594]
This article describes an efficient method to learn distributed representations, also known as embeddings.
We show that the sotfmax normalization constants can be estimated in linear time, allowing us to design an efficient optimization strategy.
arXiv Detail & Related papers (2023-03-30T15:48:26Z) - Score-based Diffusion Models in Function Space [140.792362459734]
Diffusion models have recently emerged as a powerful framework for generative modeling.
We introduce a mathematically rigorous framework called Denoising Diffusion Operators (DDOs) for training diffusion models in function space.
We show that the corresponding discretized algorithm generates accurate samples at a fixed cost independent of the data resolution.
arXiv Detail & Related papers (2023-02-14T23:50:53Z) - Statistical Efficiency of Score Matching: The View from Isoperimetry [96.65637602827942]
We show a tight connection between statistical efficiency of score matching and the isoperimetric properties of the distribution being estimated.
We formalize these results both in the sample regime and in the finite regime.
arXiv Detail & Related papers (2022-10-03T06:09:01Z) - A Stochastic Newton Algorithm for Distributed Convex Optimization [62.20732134991661]
We analyze a Newton algorithm for homogeneous distributed convex optimization, where each machine can calculate gradients of the same population objective.
We show that our method can reduce the number, and frequency, of required communication rounds compared to existing methods without hurting performance.
arXiv Detail & Related papers (2021-10-07T17:51:10Z) - Near-Optimal High Probability Complexity Bounds for Non-Smooth
Stochastic Optimization with Heavy-Tailed Noise [63.304196997102494]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity with bounds dependence on the confidence level that is either negative-power or logarithmic.
We propose novel stepsize rules for two gradient methods with clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - A Nonconvex Framework for Structured Dynamic Covariance Recovery [24.471814126358556]
We propose a flexible yet interpretable model for high-dimensional data with time-varying second order statistics.
Motivated by the literature, we quantify factorization and smooth temporal data.
We show that our approach outperforms existing baselines.
arXiv Detail & Related papers (2020-11-11T07:09:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.