CoLiDE: Concomitant Linear DAG Estimation
- URL: http://arxiv.org/abs/2310.02895v2
- Date: Wed, 13 Mar 2024 14:56:11 GMT
- Title: CoLiDE: Concomitant Linear DAG Estimation
- Authors: Seyed Saman Saboksayr, Gonzalo Mateos, Mariano Tepper
- Abstract summary: We deal with the problem of learning acyclic graph structure from observational data to a linear equation.
We propose a new convex score function for sparsity-aware learning DAGs.
- Score: 12.415463205960156
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We deal with the combinatorial problem of learning directed acyclic graph
(DAG) structure from observational data adhering to a linear structural
equation model (SEM). Leveraging advances in differentiable, nonconvex
characterizations of acyclicity, recent efforts have advocated a continuous
constrained optimization paradigm to efficiently explore the space of DAGs.
Most existing methods employ lasso-type score functions to guide this search,
which (i) require expensive penalty parameter retuning when the
$\textit{unknown}$ SEM noise variances change across problem instances; and
(ii) implicitly rely on limiting homoscedasticity assumptions. In this work, we
propose a new convex score function for sparsity-aware learning of linear DAGs,
which incorporates concomitant estimation of scale and thus effectively
decouples the sparsity parameter from the exogenous noise levels.
Regularization via a smooth, nonconvex acyclicity penalty term yields CoLiDE
($\textbf{Co}$ncomitant $\textbf{Li}$near $\textbf{D}$AG
$\textbf{E}$stimation), a regression-based criterion amenable to efficient
gradient computation and closed-form estimation of noise variances in
heteroscedastic scenarios. Our algorithm outperforms state-of-the-art methods
without incurring added complexity, especially when the DAGs are larger and the
noise level profile is heterogeneous. We also find CoLiDE exhibits enhanced
stability manifested via reduced standard deviations in several domain-specific
metrics, underscoring the robustness of our novel linear DAG estimator.
Related papers
- Gradient Normalization with(out) Clipping Ensures Convergence of Nonconvex SGD under Heavy-Tailed Noise with Improved Results [60.92029979853314]
This paper investigates Gradient Normalization without (NSGDC) its gradient reduction variant (NSGDC-VR)
We present significant improvements in the theoretical results for both algorithms.
arXiv Detail & Related papers (2024-10-21T22:40:42Z) - Non-negative Weighted DAG Structure Learning [12.139158398361868]
We address the problem of learning the true DAGs from nodal observations.
We propose a DAG recovery algorithm based on the method that is guaranteed to return ar.
arXiv Detail & Related papers (2024-09-12T09:41:29Z) - Kernel-Based Differentiable Learning of Non-Parametric Directed Acyclic Graphical Models [17.52142371968811]
Causal discovery amounts to learning a directed acyclic graph (DAG) that encodes a causal model.
Recent research has sought to bypass the search by reformulating causal discovery as a continuous optimization problem.
arXiv Detail & Related papers (2024-08-20T16:09:40Z) - Distributionally Robust Optimization with Bias and Variance Reduction [9.341215359733601]
We show that Prospect, a gradient-based algorithm, enjoys linear convergence for smooth regularized losses.
We also show that Prospect can converge 2-3$times$ faster than baselines such as gradient-based methods.
arXiv Detail & Related papers (2023-10-21T00:03:54Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - dotears: Scalable, consistent DAG estimation using observational and
interventional data [1.220743263007369]
Causal gene regulatory networks can be represented by directed acyclic graph (DAG)
We present $texttdotears$ [doo-tairs], a continuous optimization framework to infer a single causal structure.
We show that $texttdotears$ is a provably consistent estimator of the true DAG under mild assumptions.
arXiv Detail & Related papers (2023-05-30T17:03:39Z) - On the Sparse DAG Structure Learning Based on Adaptive Lasso [39.31370830038554]
We develop a data-driven DAG structure learning method without the predefined threshold, called adaptive NOTEARS [30]
We show that adaptive NOTEARS enjoys the oracle properties under some specific conditions. Furthermore, simulation results validate the effectiveness of our method, without setting any gap of edges around zero.
arXiv Detail & Related papers (2022-09-07T05:47:59Z) - Clipped Stochastic Methods for Variational Inequalities with
Heavy-Tailed Noise [64.85879194013407]
We prove the first high-probability results with logarithmic dependence on the confidence level for methods for solving monotone and structured non-monotone VIPs.
Our results match the best-known ones in the light-tails case and are novel for structured non-monotone problems.
In addition, we numerically validate that the gradient noise of many practical formulations is heavy-tailed and show that clipping improves the performance of SEG/SGDA.
arXiv Detail & Related papers (2022-06-02T15:21:55Z) - BCD Nets: Scalable Variational Approaches for Bayesian Causal Discovery [97.79015388276483]
A structural equation model (SEM) is an effective framework to reason over causal relationships represented via a directed acyclic graph (DAG)
Recent advances enabled effective maximum-likelihood point estimation of DAGs from observational data.
We propose BCD Nets, a variational framework for estimating a distribution over DAGs characterizing a linear-Gaussian SEM.
arXiv Detail & Related papers (2021-12-06T03:35:21Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.