Learning to Solve the Constrained Most Probable Explanation Task in Probabilistic Graphical Models
- URL: http://arxiv.org/abs/2404.11606v1
- Date: Wed, 17 Apr 2024 17:55:17 GMT
- Title: Learning to Solve the Constrained Most Probable Explanation Task in Probabilistic Graphical Models
- Authors: Shivvrat Arya, Tahrima Rahman, Vibhav Gogate,
- Abstract summary: We train a deep neural network that learns to output near-optimal solutions to the constrained most-probable explanation (CMPE) problem.
We analyze the properties of our proposed method and experimentally demonstrate its efficacy on several benchmark problems.
- Score: 10.603378323312809
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a self-supervised learning approach for solving the following constrained optimization task in log-linear models or Markov networks. Let $f$ and $g$ be two log-linear models defined over the sets $\mathbf{X}$ and $\mathbf{Y}$ of random variables respectively. Given an assignment $\mathbf{x}$ to all variables in $\mathbf{X}$ (evidence) and a real number $q$, the constrained most-probable explanation (CMPE) task seeks to find an assignment $\mathbf{y}$ to all variables in $\mathbf{Y}$ such that $f(\mathbf{x}, \mathbf{y})$ is maximized and $g(\mathbf{x}, \mathbf{y})\leq q$. In our proposed self-supervised approach, given assignments $\mathbf{x}$ to $\mathbf{X}$ (data), we train a deep neural network that learns to output near-optimal solutions to the CMPE problem without requiring access to any pre-computed solutions. The key idea in our approach is to use first principles and approximate inference methods for CMPE to derive novel loss functions that seek to push infeasible solutions towards feasible ones and feasible solutions towards optimal ones. We analyze the properties of our proposed method and experimentally demonstrate its efficacy on several benchmark problems.
Related papers
- Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models.
In this work, we initiate the study of provably learning a multi-head attention layer from random examples.
We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z) - An Oblivious Stochastic Composite Optimization Algorithm for Eigenvalue
Optimization Problems [76.2042837251496]
We introduce two oblivious mirror descent algorithms based on a complementary composite setting.
Remarkably, both algorithms work without prior knowledge of the Lipschitz constant or smoothness of the objective function.
We show how to extend our framework to scale and demonstrate the efficiency and robustness of our methods on large scale semidefinite programs.
arXiv Detail & Related papers (2023-06-30T08:34:29Z) - Fast $(1+\varepsilon)$-Approximation Algorithms for Binary Matrix
Factorization [54.29685789885059]
We introduce efficient $(1+varepsilon)$-approximation algorithms for the binary matrix factorization (BMF) problem.
The goal is to approximate $mathbfA$ as a product of low-rank factors.
Our techniques generalize to other common variants of the BMF problem.
arXiv Detail & Related papers (2023-06-02T18:55:27Z) - Randomized Block-Coordinate Optimistic Gradient Algorithms for
Root-Finding Problems [8.0153031008486]
We develop two new algorithms to approximate a solution of nonlinear equations in large-scale settings.
We apply our methods to a class of large-scale finite-sum inclusions, which covers prominent applications in machine learning.
arXiv Detail & Related papers (2023-01-08T21:46:27Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Decentralized Stochastic Variance Reduced Extragradient Method [25.21457349137344]
This paper studies decentralized convex-concave minimax optimization problems of the form $min_xmax_y fx,y triqfrac1msumi=1m f_i triqfrac1msumi=1m f_i triqfrac1msumi=1m f_i triqfrac1msumi=1m f_i triqfrac1msum
arXiv Detail & Related papers (2022-02-01T16:06:20Z) - The Fine-Grained Hardness of Sparse Linear Regression [12.83354999540079]
We show that there are no better-than-brute-force algorithms for the problem.
We also show the impossibility of better-than-brute-force algorithms when the prediction error is measured.
arXiv Detail & Related papers (2021-06-06T14:19:43Z) - Nonparametric Learning of Two-Layer ReLU Residual Units [22.870658194212744]
We describe an algorithm that learns two-layer residual units with rectified linear unit (ReLU) activation.
We design layer-wise objectives as functionals whose analytic minimizers express the exact ground-truth network in terms of its parameters and nonlinearities.
We prove the statistical strong consistency of our algorithm, and demonstrate the robustness and sample efficiency of our algorithm by experiments.
arXiv Detail & Related papers (2020-08-17T22:11:26Z) - Maximizing Determinants under Matroid Constraints [69.25768526213689]
We study the problem of finding a basis $S$ of $M$ such that $det(sum_i in Sv_i v_i v_itop)$ is maximized.
This problem appears in a diverse set of areas such as experimental design, fair allocation of goods, network design, and machine learning.
arXiv Detail & Related papers (2020-04-16T19:16:38Z) - On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems [86.92205445270427]
We consider non-con minimax problems, $min_mathbfx max_mathhidoty f(mathbfdoty)$ efficiently.
arXiv Detail & Related papers (2019-06-02T03:03:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.