Implicit MLE: Backpropagating Through Discrete Exponential Family
Distributions
- URL: http://arxiv.org/abs/2106.01798v1
- Date: Thu, 3 Jun 2021 12:42:21 GMT
- Title: Implicit MLE: Backpropagating Through Discrete Exponential Family
Distributions
- Authors: Mathias Niepert and Pasquale Minervini and Luca Franceschi
- Abstract summary: Implicit Maximum Likelihood Estimation is a framework for end-to-end learning of models combining discrete exponential family distributions and differentiable neural components.
We show that I-MLE is competitive with and often outperforms existing approaches which rely on problem-specific relaxations.
- Score: 24.389388509299543
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Integrating discrete probability distributions and combinatorial optimization
problems into neural networks has numerous applications but poses several
challenges. We propose Implicit Maximum Likelihood Estimation (I-MLE), a
framework for end-to-end learning of models combining discrete exponential
family distributions and differentiable neural components. I-MLE is widely
applicable: it only requires the ability to compute the most probable states;
and does not rely on smooth relaxations. The framework encompasses several
approaches, such as perturbation-based implicit differentiation and recent
methods to differentiate through black-box combinatorial solvers. We introduce
a novel class of noise distributions for approximating marginals via
perturb-and-MAP. Moreover, we show that I-MLE simplifies to maximum likelihood
estimation when used in some recently studied learning settings that involve
combinatorial solvers. Experiments on several datasets suggest that I-MLE is
competitive with and often outperforms existing approaches which rely on
problem-specific relaxations.
Related papers
- Maximum likelihood inference for high-dimensional problems with multiaffine variable relations [2.4578723416255754]
In this paper, we consider inference problems where the variables are related by multiaffine expressions.
We propose a novel Alternating and Iteratively-Reweighted Least Squares (AIRLS) algorithm, and prove its convergence for problems with Generalized Normal Distributions.
arXiv Detail & Related papers (2024-09-05T13:07:31Z) - Efficient, Multimodal, and Derivative-Free Bayesian Inference With Fisher-Rao Gradient Flows [10.153270126742369]
We study efficient approximate sampling for probability distributions known up to normalization constants.
We specifically focus on a problem class arising in Bayesian inference for large-scale inverse problems in science and engineering applications.
arXiv Detail & Related papers (2024-06-25T04:07:22Z) - Proximal Interacting Particle Langevin Algorithms [0.0]
We introduce Proximal Interacting Particle Langevin Algorithms (PIPLA) for inference and learning in latent variable models.
We propose several variants within the novel proximal IPLA family, tailored to the problem of estimating parameters in a non-differentiable statistical model.
Our theory and experiments together show that PIPLA family can be the de facto choice for parameter estimation problems in latent variable models for non-differentiable models.
arXiv Detail & Related papers (2024-06-20T13:16:41Z) - Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions.
We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z) - Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - Optimizing Solution-Samplers for Combinatorial Problems: The Landscape
of Policy-Gradient Methods [52.0617030129699]
We introduce a novel theoretical framework for analyzing the effectiveness of DeepMatching Networks and Reinforcement Learning methods.
Our main contribution holds for a broad class of problems including Max-and Min-Cut, Max-$k$-Bipartite-Bi, Maximum-Weight-Bipartite-Bi, and Traveling Salesman Problem.
As a byproduct of our analysis we introduce a novel regularization process over vanilla descent and provide theoretical and experimental evidence that it helps address vanishing-gradient issues and escape bad stationary points.
arXiv Detail & Related papers (2023-10-08T23:39:38Z) - Amortizing intractable inference in large language models [56.92471123778389]
We use amortized Bayesian inference to sample from intractable posterior distributions.
We empirically demonstrate that this distribution-matching paradigm of LLM fine-tuning can serve as an effective alternative to maximum-likelihood training.
As an important application, we interpret chain-of-thought reasoning as a latent variable modeling problem.
arXiv Detail & Related papers (2023-10-06T16:36:08Z) - Generalizing Multimodal Variational Methods to Sets [35.69942798534849]
This paper presents a novel variational method on sets called the Set Multimodal VAE (SMVAE) for learning a multimodal latent space.
By modeling the joint-modality posterior distribution directly, the proposed SMVAE learns to exchange information between multiple modalities and compensate for the drawbacks caused by factorization.
arXiv Detail & Related papers (2022-12-19T23:50:19Z) - Efficient semidefinite-programming-based inference for binary and
multi-class MRFs [83.09715052229782]
We propose an efficient method for computing the partition function or MAP estimate in a pairwise MRF.
We extend semidefinite relaxations from the typical binary MRF to the full multi-class setting, and develop a compact semidefinite relaxation that can again be solved efficiently using the solver.
arXiv Detail & Related papers (2020-12-04T15:36:29Z) - Neural Mixture Distributional Regression [0.9023847175654603]
We present a holistic framework to estimate finite mixtures of distributional regressions defined by flexible additive predictors.
Our framework is able to handle a large number of mixtures of potentially different distributions in high-dimensional settings.
arXiv Detail & Related papers (2020-10-14T09:00:16Z) - Modal Regression based Structured Low-rank Matrix Recovery for
Multi-view Learning [70.57193072829288]
Low-rank Multi-view Subspace Learning has shown great potential in cross-view classification in recent years.
Existing LMvSL based methods are incapable of well handling view discrepancy and discriminancy simultaneously.
We propose Structured Low-rank Matrix Recovery (SLMR), a unique method of effectively removing view discrepancy and improving discriminancy.
arXiv Detail & Related papers (2020-03-22T03:57:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.