Implicit MLE: Backpropagating Through Discrete Exponential Family
Distributions
- URL: http://arxiv.org/abs/2106.01798v1
- Date: Thu, 3 Jun 2021 12:42:21 GMT
- Title: Implicit MLE: Backpropagating Through Discrete Exponential Family
Distributions
- Authors: Mathias Niepert and Pasquale Minervini and Luca Franceschi
- Abstract summary: Implicit Maximum Likelihood Estimation is a framework for end-to-end learning of models combining discrete exponential family distributions and differentiable neural components.
We show that I-MLE is competitive with and often outperforms existing approaches which rely on problem-specific relaxations.
- Score: 24.389388509299543
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Integrating discrete probability distributions and combinatorial optimization
problems into neural networks has numerous applications but poses several
challenges. We propose Implicit Maximum Likelihood Estimation (I-MLE), a
framework for end-to-end learning of models combining discrete exponential
family distributions and differentiable neural components. I-MLE is widely
applicable: it only requires the ability to compute the most probable states;
and does not rely on smooth relaxations. The framework encompasses several
approaches, such as perturbation-based implicit differentiation and recent
methods to differentiate through black-box combinatorial solvers. We introduce
a novel class of noise distributions for approximating marginals via
perturb-and-MAP. Moreover, we show that I-MLE simplifies to maximum likelihood
estimation when used in some recently studied learning settings that involve
combinatorial solvers. Experiments on several datasets suggest that I-MLE is
competitive with and often outperforms existing approaches which rely on
problem-specific relaxations.
Related papers
- Efficient, Multimodal, and Derivative-Free Bayesian Inference With Fisher-Rao Gradient Flows [10.153270126742369]
We study efficient approximate sampling for probability distributions known up to normalization constants.
We specifically focus on a problem class arising in Bayesian inference for large-scale inverse problems in science and engineering applications.
arXiv Detail & Related papers (2024-06-25T04:07:22Z) - Proximal Interacting Particle Langevin Algorithms [0.0]
We introduce Proximal Interacting Particle Langevin Algorithms (PIPLA) for inference and learning in latent variable models.
We propose several variants within the novel proximal IPLA family, tailored to the problem of estimating parameters in a non-differentiable statistical model.
Our theory and experiments together show that PIPLA family can be the de facto choice for parameter estimation problems in latent variable models for non-differentiable models.
arXiv Detail & Related papers (2024-06-20T13:16:41Z) - Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions.
We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z) - Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - Optimizing Solution-Samplers for Combinatorial Problems: The Landscape
of Policy-Gradient Methods [52.0617030129699]
We introduce a novel theoretical framework for analyzing the effectiveness of DeepMatching Networks and Reinforcement Learning methods.
Our main contribution holds for a broad class of problems including Max-and Min-Cut, Max-$k$-Bipartite-Bi, Maximum-Weight-Bipartite-Bi, and Traveling Salesman Problem.
As a byproduct of our analysis we introduce a novel regularization process over vanilla descent and provide theoretical and experimental evidence that it helps address vanishing-gradient issues and escape bad stationary points.
arXiv Detail & Related papers (2023-10-08T23:39:38Z) - Amortizing intractable inference in large language models [56.92471123778389]
We use amortized Bayesian inference to sample from intractable posterior distributions.
We empirically demonstrate that this distribution-matching paradigm of LLM fine-tuning can serve as an effective alternative to maximum-likelihood training.
As an important application, we interpret chain-of-thought reasoning as a latent variable modeling problem.
arXiv Detail & Related papers (2023-10-06T16:36:08Z) - Self-Learning Symmetric Multi-view Probabilistic Clustering [50.32622999323709]
Multi-view Clustering (MVC) has achieved significant progress, with many efforts dedicated to learn knowledge from multiple views.
Most existing methods are either not applicable or require additional steps for incomplete MVC.
We propose a novel unified framework for incomplete and complete MVC named self-learning symmetric multi-view probabilistic clustering.
arXiv Detail & Related papers (2023-05-12T08:27:03Z) - Generalizing Multimodal Variational Methods to Sets [35.69942798534849]
This paper presents a novel variational method on sets called the Set Multimodal VAE (SMVAE) for learning a multimodal latent space.
By modeling the joint-modality posterior distribution directly, the proposed SMVAE learns to exchange information between multiple modalities and compensate for the drawbacks caused by factorization.
arXiv Detail & Related papers (2022-12-19T23:50:19Z) - Efficient semidefinite-programming-based inference for binary and
multi-class MRFs [83.09715052229782]
We propose an efficient method for computing the partition function or MAP estimate in a pairwise MRF.
We extend semidefinite relaxations from the typical binary MRF to the full multi-class setting, and develop a compact semidefinite relaxation that can again be solved efficiently using the solver.
arXiv Detail & Related papers (2020-12-04T15:36:29Z) - Neural Mixture Distributional Regression [0.9023847175654603]
We present a holistic framework to estimate finite mixtures of distributional regressions defined by flexible additive predictors.
Our framework is able to handle a large number of mixtures of potentially different distributions in high-dimensional settings.
arXiv Detail & Related papers (2020-10-14T09:00:16Z) - Stacking for Non-mixing Bayesian Computations: The Curse and Blessing of
Multimodal Posteriors [8.11978827493967]
We propose an approach using parallel runs of MCMC, variational, or mode-based inference to hit as many modes as possible.
We present theoretical consistency with an example where the stacked inference process approximates the true data.
We demonstrate practical implementation in several model families.
arXiv Detail & Related papers (2020-06-22T15:26:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.