Bi-level Score Matching for Learning Energy-based Latent Variable Models
- URL: http://arxiv.org/abs/2010.07856v2
- Date: Fri, 16 Oct 2020 07:33:06 GMT
- Title: Bi-level Score Matching for Learning Energy-based Latent Variable Models
- Authors: Fan Bao, Chongxuan Li, Kun Xu, Hang Su, Jun Zhu, Bo Zhang
- Abstract summary: Score matching (SM) provides a compelling approach to learn energy-based models (EBMs) by avoiding the calculation of partition function.
This paper presents a bi-level score matching (BiSM) method to learn EBLVMs with general structures.
We show that BiSM is comparable to the widely adopted contrastive divergence and SM methods when they are applicable.
- Score: 46.7000048886801
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Score matching (SM) provides a compelling approach to learn energy-based
models (EBMs) by avoiding the calculation of partition function. However, it
remains largely open to learn energy-based latent variable models (EBLVMs),
except some special cases. This paper presents a bi-level score matching (BiSM)
method to learn EBLVMs with general structures by reformulating SM as a
bi-level optimization problem. The higher level introduces a variational
posterior of the latent variables and optimizes a modified SM objective, and
the lower level optimizes the variational posterior to fit the true posterior.
To solve BiSM efficiently, we develop a stochastic optimization algorithm with
gradient unrolling. Theoretically, we analyze the consistency of BiSM and the
convergence of the stochastic algorithm. Empirically, we show the promise of
BiSM in Gaussian restricted Boltzmann machines and highly nonstructural EBLVMs
parameterized by deep convolutional neural networks. BiSM is comparable to the
widely adopted contrastive divergence and SM methods when they are applicable;
and can learn complex EBLVMs with intractable posteriors to generate natural
images.
Related papers
- Joint Learning of Energy-based Models and their Partition Function [19.174145933837927]
Energy-based models (EBMs) offer a flexible framework for parameterizing probability distributions using neural networks.
We propose a novel formulation for approximately learning EBMs inly-large discrete spaces.
We show that our approach naturally extends to the broader family of Fenchel-Young losses.
arXiv Detail & Related papers (2025-01-30T17:46:17Z) - A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning.
These problems are often formalized as Bi-Level optimizations (BLO)
We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z) - Hyperparameter Estimation for Sparse Bayesian Learning Models [1.0172874946490507]
Aparse Bayesian Learning (SBL) models are extensively used in signal processing and machine learning for promoting sparsity through hierarchical priors.
This paper presents a framework for the improvement of SBL models for various objective functions.
A novel algorithm is introduced showing enhanced efficiency, especially under signal noise ratios.
arXiv Detail & Related papers (2024-01-04T21:24:01Z) - Optimal Algorithms for Stochastic Bilevel Optimization under Relaxed
Smoothness Conditions [9.518010235273785]
We present a novel fully Liploop Hessian-inversion-free algorithmic framework for bilevel optimization.
We show that by a slight modification of our approach our approach can handle a more general multi-objective robust bilevel optimization problem.
arXiv Detail & Related papers (2023-06-21T07:32:29Z) - Bi-level Doubly Variational Learning for Energy-based Latent Variable
Models [46.75117861209482]
Energy-based latent variable models (EBLVMs) are more expressive than conventional energy-based models.
We propose Bi-level doubly variational learning (BiDVL) to facilitate learning EBLVMs.
Our model achieves impressive image generation performance over related works.
arXiv Detail & Related papers (2022-03-24T04:13:38Z) - ES-Based Jacobian Enables Faster Bilevel Optimization [53.675623215542515]
Bilevel optimization (BO) has arisen as a powerful tool for solving many modern machine learning problems.
Existing gradient-based methods require second-order derivative approximations via Jacobian- or/and Hessian-vector computations.
We propose a novel BO algorithm, which adopts Evolution Strategies (ES) based method to approximate the response Jacobian matrix in the hypergradient of BO.
arXiv Detail & Related papers (2021-10-13T19:36:50Z) - Value-Function-based Sequential Minimization for Bi-level Optimization [52.39882976848064]
gradient-based Bi-Level Optimization (BLO) methods have been widely applied to handle modern learning tasks.
There are almost no gradient-based methods able to solve BLO in challenging scenarios, such as BLO with functional constraints and pessimistic BLO.
We provide Bi-level Value-Function-based Sequential Minimization (BVFSM) to address the above issues.
arXiv Detail & Related papers (2021-10-11T03:13:39Z) - Meta-Learning with Neural Tangent Kernels [58.06951624702086]
We propose the first meta-learning paradigm in the Reproducing Kernel Hilbert Space (RKHS) induced by the meta-model's Neural Tangent Kernel (NTK)
Within this paradigm, we introduce two meta-learning algorithms, which no longer need a sub-optimal iterative inner-loop adaptation as in the MAML framework.
We achieve this goal by 1) replacing the adaptation with a fast-adaptive regularizer in the RKHS; and 2) solving the adaptation analytically based on the NTK theory.
arXiv Detail & Related papers (2021-02-07T20:53:23Z) - A Generic First-Order Algorithmic Framework for Bi-Level Programming
Beyond Lower-Level Singleton [49.23948907229656]
Bi-level Descent Aggregation is a flexible and modularized algorithmic framework for generic bi-level optimization.
We derive a new methodology to prove the convergence of BDA without the LLS condition.
Our investigations also demonstrate that BDA is indeed compatible to a verify of particular first-order computation modules.
arXiv Detail & Related papers (2020-06-07T05:18:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.