An Asymptotically Optimal Coordinate Descent Algorithm for Learning Bayesian Networks from Gaussian Models
- URL: http://arxiv.org/abs/2408.11977v2
- Date: Tue, 17 Sep 2024 18:14:39 GMT
- Title: An Asymptotically Optimal Coordinate Descent Algorithm for Learning Bayesian Networks from Gaussian Models
- Authors: Tong Xu, Simge Küçükyavuz, Ali Shojaie, Armeen Taeb,
- Abstract summary: We study the problem of learning networks from continuous observational data, generated according to a linear Gaussian structural equation model.
We propose a new coordinate descent algorithm that converges to the optimal objective value of the $ell$penalized maximum likelihood.
- Score: 6.54203362045253
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper studies the problem of learning Bayesian networks from continuous observational data, generated according to a linear Gaussian structural equation model. We consider an $\ell_0$-penalized maximum likelihood estimator for this problem which is known to have favorable statistical properties but is computationally challenging to solve, especially for medium-sized Bayesian networks. We propose a new coordinate descent algorithm to approximate this estimator and prove several remarkable properties of our procedure: the algorithm converges to a coordinate-wise minimum, and despite the non-convexity of the loss function, as the sample size tends to infinity, the objective value of the coordinate descent solution converges to the optimal objective value of the $\ell_0$-penalized maximum likelihood estimator. Finite-sample statistical consistency guarantees are also established. To the best of our knowledge, our proposal is the first coordinate descent procedure endowed with optimality and statistical guarantees in the context of learning Bayesian networks. Numerical experiments on synthetic and real data demonstrate that our coordinate descent method can obtain near-optimal solutions while being scalable.
Related papers
- Eliminating Ratio Bias for Gradient-based Simulated Parameter Estimation [0.7673339435080445]
This article addresses the challenge of parameter calibration in models where the likelihood function is not analytically available.
We propose a gradient-based simulated parameter estimation framework, leveraging a multi-time scale that tackles the issue of ratio bias in both maximum likelihood estimation and posterior density estimation problems.
arXiv Detail & Related papers (2024-11-20T02:46:15Z) - Trust-Region Sequential Quadratic Programming for Stochastic Optimization with Random Models [57.52124921268249]
We propose a Trust Sequential Quadratic Programming method to find both first and second-order stationary points.
To converge to first-order stationary points, our method computes a gradient step in each iteration defined by minimizing a approximation of the objective subject.
To converge to second-order stationary points, our method additionally computes an eigen step to explore the negative curvature the reduced Hessian matrix.
arXiv Detail & Related papers (2024-09-24T04:39:47Z) - Optimal convex $M$-estimation via score matching [6.115859302936817]
We construct a data-driven convex loss function with respect to which empirical risk minimisation yields optimal variance in the downstream estimation of the regression coefficients.
Our semiparametric approach targets the best decreasing approximation of the derivative of the derivative of the log-density of the noise distribution.
arXiv Detail & Related papers (2024-03-25T12:23:19Z) - Promises and Pitfalls of the Linearized Laplace in Bayesian Optimization [73.80101701431103]
The linearized-Laplace approximation (LLA) has been shown to be effective and efficient in constructing Bayesian neural networks.
We study the usefulness of the LLA in Bayesian optimization and highlight its strong performance and flexibility.
arXiv Detail & Related papers (2023-04-17T14:23:43Z) - Composite Optimization Algorithms for Sigmoid Networks [3.160070867400839]
We propose the composite optimization algorithms based on the linearized proximal algorithms and the alternating direction of multipliers.
Numerical experiments on Frank's function fitting show that the proposed algorithms perform satisfactorily robustly.
arXiv Detail & Related papers (2023-03-01T15:30:29Z) - Fully Stochastic Trust-Region Sequential Quadratic Programming for
Equality-Constrained Optimization Problems [62.83783246648714]
We propose a sequential quadratic programming algorithm (TR-StoSQP) to solve nonlinear optimization problems with objectives and deterministic equality constraints.
The algorithm adaptively selects the trust-region radius and, compared to the existing line-search StoSQP schemes, allows us to utilize indefinite Hessian matrices.
arXiv Detail & Related papers (2022-11-29T05:52:17Z) - Stochastic Mirror Descent for Large-Scale Sparse Recovery [13.500750042707407]
We discuss an application of quadratic Approximation to statistical estimation of high-dimensional sparse parameters.
We show that the proposed algorithm attains the optimal convergence of the estimation error under weak assumptions on the regressor distribution.
arXiv Detail & Related papers (2022-10-23T23:23:23Z) - Outlier-Robust Sparse Estimation via Non-Convex Optimization [73.18654719887205]
We explore the connection between high-dimensional statistics and non-robust optimization in the presence of sparsity constraints.
We develop novel and simple optimization formulations for these problems.
As a corollary, we obtain that any first-order method that efficiently converges to station yields an efficient algorithm for these tasks.
arXiv Detail & Related papers (2021-09-23T17:38:24Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - Amortized Conditional Normalized Maximum Likelihood: Reliable Out of
Distribution Uncertainty Estimation [99.92568326314667]
We propose the amortized conditional normalized maximum likelihood (ACNML) method as a scalable general-purpose approach for uncertainty estimation.
Our algorithm builds on the conditional normalized maximum likelihood (CNML) coding scheme, which has minimax optimal properties according to the minimum description length principle.
We demonstrate that ACNML compares favorably to a number of prior techniques for uncertainty estimation in terms of calibration on out-of-distribution inputs.
arXiv Detail & Related papers (2020-11-05T08:04:34Z) - A Feasible Level Proximal Point Method for Nonconvex Sparse Constrained
Optimization [25.73397307080647]
We present a new model of a general convex or non objective machine machine objectives.
We propose an algorithm that solves a constraint with gradually relaxed point levels of each subproblem.
We demonstrate the effectiveness of our new numerical scale problems.
arXiv Detail & Related papers (2020-10-23T05:24:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.