Joint Learning of Energy-based Models and their Partition Function
- URL: http://arxiv.org/abs/2501.18528v1
- Date: Thu, 30 Jan 2025 17:46:17 GMT
- Title: Joint Learning of Energy-based Models and their Partition Function
- Authors: Michael E. Sander, Vincent Roulet, Tianlin Liu, Mathieu Blondel,
- Abstract summary: Energy-based models (EBMs) offer a flexible framework for parameterizing probability distributions using neural networks.
We propose a novel formulation for approximately learning EBMs inly-large discrete spaces.
We show that our approach naturally extends to the broader family of Fenchel-Young losses.
- Score: 19.174145933837927
- License:
- Abstract: Energy-based models (EBMs) offer a flexible framework for parameterizing probability distributions using neural networks. However, learning EBMs by exact maximum likelihood estimation (MLE) is generally intractable, due to the need to compute the partition function (normalization constant). In this paper, we propose a novel formulation for approximately learning probabilistic EBMs in combinatorially-large discrete spaces, such as sets or permutations. Our key idea is to jointly learn both an energy model and its log-partition, both parameterized as a neural network. Our approach not only provides a novel tractable objective criterion to learn EBMs by stochastic gradient descent (without relying on MCMC), but also a novel means to estimate the log-partition function on unseen data points. On the theoretical side, we show that our approach recovers the optimal MLE solution when optimizing in the space of continuous functions. Furthermore, we show that our approach naturally extends to the broader family of Fenchel-Young losses, allowing us to obtain the first tractable method for optimizing the sparsemax loss in combinatorially-large spaces. We demonstrate our approach on multilabel classification and label ranking.
Related papers
- A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning.
These problems are often formalized as Bi-Level optimizations (BLO)
We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z) - Strongly Isomorphic Neural Optimal Transport Across Incomparable Spaces [7.535219325248997]
We present a novel neural formulation of the Gromov-Monge problem rooted in one of its fundamental properties.
We operationalize this property by decomposing the learnable OT map into two components.
Our framework provides a promising approach to learn OT maps across diverse spaces.
arXiv Detail & Related papers (2024-07-20T18:27:11Z) - Maximize to Explore: One Objective Function Fusing Estimation, Planning,
and Exploration [87.53543137162488]
We propose an easy-to-implement online reinforcement learning (online RL) framework called textttMEX.
textttMEX integrates estimation and planning components while balancing exploration exploitation automatically.
It can outperform baselines by a stable margin in various MuJoCo environments with sparse rewards.
arXiv Detail & Related papers (2023-05-29T17:25:26Z) - D4FT: A Deep Learning Approach to Kohn-Sham Density Functional Theory [79.50644650795012]
We propose a deep learning approach to solve Kohn-Sham Density Functional Theory (KS-DFT)
We prove that such an approach has the same expressivity as the SCF method, yet reduces the computational complexity.
In addition, we show that our approach enables us to explore more complex neural-based wave functions.
arXiv Detail & Related papers (2023-03-01T10:38:10Z) - Improvements to Supervised EM Learning of Shared Kernel Models by
Feature Space Partitioning [0.0]
This paper addresses the lack of rigour in the derivation of the EM training algorithm and the computational complexity of the technique.
We first present a detailed derivation of EM for the Gaussian shared kernel model PRBF classifier.
To reduce complexity of the resulting SKEM algorithm, we partition the feature space into $R$ non-overlapping subsets of variables.
arXiv Detail & Related papers (2022-05-31T09:18:58Z) - Pseudo-Spherical Contrastive Divergence [119.28384561517292]
We propose pseudo-spherical contrastive divergence (PS-CD) to generalize maximum learning likelihood of energy-based models.
PS-CD avoids the intractable partition function and provides a generalized family of learning objectives.
arXiv Detail & Related papers (2021-11-01T09:17:15Z) - Information Theoretic Structured Generative Modeling [13.117829542251188]
A novel generative model framework called the structured generative model (SGM) is proposed that makes straightforward optimization possible.
The implementation employs a single neural network driven by an orthonormal input to a single white noise source adapted to learn an infinite Gaussian mixture model.
Preliminary results show that SGM significantly improves MINE estimation in terms of data efficiency and variance, conventional and variational Gaussian mixture models, as well as for training adversarial networks.
arXiv Detail & Related papers (2021-10-12T07:44:18Z) - Implicit MLE: Backpropagating Through Discrete Exponential Family
Distributions [24.389388509299543]
Implicit Maximum Likelihood Estimation is a framework for end-to-end learning of models combining discrete exponential family distributions and differentiable neural components.
We show that I-MLE is competitive with and often outperforms existing approaches which rely on problem-specific relaxations.
arXiv Detail & Related papers (2021-06-03T12:42:21Z) - Efficient semidefinite-programming-based inference for binary and
multi-class MRFs [83.09715052229782]
We propose an efficient method for computing the partition function or MAP estimate in a pairwise MRF.
We extend semidefinite relaxations from the typical binary MRF to the full multi-class setting, and develop a compact semidefinite relaxation that can again be solved efficiently using the solver.
arXiv Detail & Related papers (2020-12-04T15:36:29Z) - Training Deep Energy-Based Models with f-Divergence Minimization [113.97274898282343]
Deep energy-based models (EBMs) are very flexible in distribution parametrization but computationally challenging.
We propose a general variational framework termed f-EBM to train EBMs using any desired f-divergence.
Experimental results demonstrate the superiority of f-EBM over contrastive divergence, as well as the benefits of training EBMs using f-divergences other than KL.
arXiv Detail & Related papers (2020-03-06T23:11:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.