Related papers: Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions

Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions

URL: http://arxiv.org/abs/2106.01798v1
Date: Thu, 3 Jun 2021 12:42:21 GMT
Title: Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions
Authors: Mathias Niepert and Pasquale Minervini and Luca Franceschi
Abstract summary: Implicit Maximum Likelihood Estimation is a framework for end-to-end learning of models combining discrete exponential family distributions and differentiable neural components. We show that I-MLE is competitive with and often outperforms existing approaches which rely on problem-specific relaxations.
Score: 24.389388509299543
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Integrating discrete probability distributions and combinatorial optimization problems into neural networks has numerous applications but poses several challenges. We propose Implicit Maximum Likelihood Estimation (I-MLE), a framework for end-to-end learning of models combining discrete exponential family distributions and differentiable neural components. I-MLE is widely applicable: it only requires the ability to compute the most probable states; and does not rely on smooth relaxations. The framework encompasses several approaches, such as perturbation-based implicit differentiation and recent methods to differentiate through black-box combinatorial solvers. We introduce a novel class of noise distributions for approximating marginals via perturb-and-MAP. Moreover, we show that I-MLE simplifies to maximum likelihood estimation when used in some recently studied learning settings that involve combinatorial solvers. Experiments on several datasets suggest that I-MLE is competitive with and often outperforms existing approaches which rely on problem-specific relaxations.

Related papers

Steering Large Agent Populations using Mean-Field Schrodinger Bridges with Gaussian Mixture Models [13.03355083378673]
Mean-Field Schrodinger Bridge (MFSB) problem is an optimization problem aiming to find the minimum effort control policy. In the context of multiagent control, the objective is to control the configuration of a swarm of identical, interacting cooperative agents.
arXiv Detail & Related papers (2025-03-31T04:01:04Z)
Joint Learning of Energy-based Models and their Partition Function [19.174145933837927]
Energy-based models (EBMs) offer a flexible framework for parameterizing probability distributions using neural networks. We propose a novel formulation for approximately learning EBMs inly-large discrete spaces. We show that our approach naturally extends to the broader family of Fenchel-Young losses.
arXiv Detail & Related papers (2025-01-30T17:46:17Z)
Adaptive Sampled Softmax with Inverted Multi-Index: Methods, Theory and Applications [79.53938312089308]
The MIDX-Sampler is a novel adaptive sampling strategy based on an inverted multi-index approach. Our method is backed by rigorous theoretical analysis, addressing key concerns such as sampling bias, gradient bias, convergence rates, and generalization error bounds.
arXiv Detail & Related papers (2025-01-15T04:09:21Z)
Go With the Flow: Fast Diffusion for Gaussian Mixture Models [16.07896640031724]
Schrodinger Bridges (SBs) are diffusion processes that steer in finite time, a given initial distribution to another final one while minimizing a suitable cost functional.<n>We propose an analytic parametrization of a set of feasible policies for solving low dimensional problems.<n>We showcase the potential of this approach in low-to-image problems such as image-to-image translation in the latent space of an autoencoder, learning of cellular dynamics using multi-marginal momentum SB problems and various other examples.
arXiv Detail & Related papers (2024-12-12T08:40:22Z)
Maximum likelihood inference for high-dimensional problems with multiaffine variable relations [2.4578723416255754]
In this paper, we consider inference problems where the variables are related by multiaffine expressions. We propose a novel Alternating and Iteratively-Reweighted Least Squares (AIRLS) algorithm, and prove its convergence for problems with Generalized Normal Distributions.
arXiv Detail & Related papers (2024-09-05T13:07:31Z)
Efficient, Multimodal, and Derivative-Free Bayesian Inference With Fisher-Rao Gradient Flows [10.153270126742369]
We study efficient approximate sampling for probability distributions known up to normalization constants. We specifically focus on a problem class arising in Bayesian inference for large-scale inverse problems in science and engineering applications.
arXiv Detail & Related papers (2024-06-25T04:07:22Z)
Proximal Interacting Particle Langevin Algorithms [0.0]
We introduce Proximal Interacting Particle Langevin Algorithms (PIPLA) for inference and learning in latent variable models. We propose several variants within the novel proximal IPLA family, tailored to the problem of estimating parameters in a non-differentiable statistical model. Our theory and experiments together show that PIPLA family can be the de facto choice for parameter estimation problems in latent variable models for non-differentiable models.
arXiv Detail & Related papers (2024-06-20T13:16:41Z)
Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions. We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z)
Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences. Our method is especially suitable for problems with well-specified likelihoods. We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z)
Optimizing Solution-Samplers for Combinatorial Problems: The Landscape of Policy-Gradient Methods [52.0617030129699]
We introduce a novel theoretical framework for analyzing the effectiveness of DeepMatching Networks and Reinforcement Learning methods. Our main contribution holds for a broad class of problems including Max-and Min-Cut, Max-$k$-Bipartite-Bi, Maximum-Weight-Bipartite-Bi, and Traveling Salesman Problem. As a byproduct of our analysis we introduce a novel regularization process over vanilla descent and provide theoretical and experimental evidence that it helps address vanishing-gradient issues and escape bad stationary points.
arXiv Detail & Related papers (2023-10-08T23:39:38Z)
Amortizing intractable inference in large language models [56.92471123778389]
We use amortized Bayesian inference to sample from intractable posterior distributions. We empirically demonstrate that this distribution-matching paradigm of LLM fine-tuning can serve as an effective alternative to maximum-likelihood training. As an important application, we interpret chain-of-thought reasoning as a latent variable modeling problem.
arXiv Detail & Related papers (2023-10-06T16:36:08Z)
Generalizing Multimodal Variational Methods to Sets [35.69942798534849]
This paper presents a novel variational method on sets called the Set Multimodal VAE (SMVAE) for learning a multimodal latent space. By modeling the joint-modality posterior distribution directly, the proposed SMVAE learns to exchange information between multiple modalities and compensate for the drawbacks caused by factorization.
arXiv Detail & Related papers (2022-12-19T23:50:19Z)
Efficient semidefinite-programming-based inference for binary and multi-class MRFs [83.09715052229782]
We propose an efficient method for computing the partition function or MAP estimate in a pairwise MRF. We extend semidefinite relaxations from the typical binary MRF to the full multi-class setting, and develop a compact semidefinite relaxation that can again be solved efficiently using the solver.
arXiv Detail & Related papers (2020-12-04T15:36:29Z)
Neural Mixture Distributional Regression [0.9023847175654603]
We present a holistic framework to estimate finite mixtures of distributional regressions defined by flexible additive predictors. Our framework is able to handle a large number of mixtures of potentially different distributions in high-dimensional settings.
arXiv Detail & Related papers (2020-10-14T09:00:16Z)
Modal Regression based Structured Low-rank Matrix Recovery for Multi-view Learning [70.57193072829288]
Low-rank Multi-view Subspace Learning has shown great potential in cross-view classification in recent years. Existing LMvSL based methods are incapable of well handling view discrepancy and discriminancy simultaneously. We propose Structured Low-rank Matrix Recovery (SLMR), a unique method of effectively removing view discrepancy and improving discriminancy.
arXiv Detail & Related papers (2020-03-22T03:57:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.