Truncated Matrix Power Iteration for Differentiable DAG Learning
- URL: http://arxiv.org/abs/2208.14571v1
- Date: Tue, 30 Aug 2022 23:56:12 GMT
- Title: Truncated Matrix Power Iteration for Differentiable DAG Learning
- Authors: Zhen Zhang, Ignavier Ng, Dong Gong, Yuhang Liu, Ehsan M Abbasnejad,
Mingming Gong, Kun Zhang, Javen Qinfeng Shi
- Abstract summary: We propose a novel DAG learning method with efficient truncated matrix power to approximate series-based DAG constraints.
Empirically, our DAG learning method outperforms the previous state-of-the-arts in various settings.
- Score: 47.69479930501961
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recovering underlying Directed Acyclic Graph structures (DAG) from
observational data is highly challenging due to the combinatorial nature of the
DAG-constrained optimization problem. Recently, DAG learning has been cast as a
continuous optimization problem by characterizing the DAG constraint as a
smooth equality one, generally based on polynomials over adjacency matrices.
Existing methods place very small coefficients on high-order polynomial terms
for stabilization, since they argue that large coefficients on the higher-order
terms are harmful due to numeric exploding. On the contrary, we discover that
large coefficients on higher-order terms are beneficial for DAG learning, when
the spectral radiuses of the adjacency matrices are small, and that larger
coefficients for higher-order terms can approximate the DAG constraints much
better than the small counterparts. Based on this, we propose a novel DAG
learning method with efficient truncated matrix power iteration to approximate
geometric series-based DAG constraints. Empirically, our DAG learning method
outperforms the previous state-of-the-arts in various settings, often by a
factor of 3 or more in terms of structural Hamming distance.
Related papers
- Non-negative Weighted DAG Structure Learning [12.139158398361868]
We address the problem of learning the true DAGs from nodal observations.
We propose a DAG recovery algorithm based on the method that is guaranteed to return ar.
arXiv Detail & Related papers (2024-09-12T09:41:29Z) - Recovering Linear Causal Models with Latent Variables via Cholesky
Factorization of Covariance Matrix [21.698480201955213]
We propose a DAG structure recovering algorithm, which is based on the Cholesky factorization of the covariance matrix of the observed data.
On synthetic and real-world datasets, the algorithm is significantly faster than previous methods and achieves the state-of-the-art performance.
arXiv Detail & Related papers (2023-11-01T17:27:49Z) - From graphs to DAGs: a low-complexity model and a scalable algorithm [0.0]
This paper presents a low-complexity model, called LoRAM for Low-Rank Additive Model, which combines low-rank matrix factorization with a sparsification mechanism for the continuous optimization of DAGs.
The proposed approach achieves a reduction from a cubic complexity to quadratic complexity while handling the same DAG characteristic function as NoTears.
arXiv Detail & Related papers (2022-04-10T10:22:56Z) - High-Dimensional Sparse Bayesian Learning without Covariance Matrices [66.60078365202867]
We introduce a new inference scheme that avoids explicit construction of the covariance matrix.
Our approach couples a little-known diagonal estimation result from numerical linear algebra with the conjugate gradient algorithm.
On several simulations, our method scales better than existing approaches in computation time and memory.
arXiv Detail & Related papers (2022-02-25T16:35:26Z) - Learning Large DAGs by Combining Continuous Optimization and Feedback
Arc Set Heuristics [0.3553493344868413]
We propose two scalable NPs for learning DAGs in a linear structural equation case.
Our methods learn the DAG by alternating between unconstrained gradient descent-based step to optimize an objective function.
Thanks to this decoupling, our methods scale up beyond thousands of variables.
arXiv Detail & Related papers (2021-07-01T16:10:21Z) - DAGs with No Curl: An Efficient DAG Structure Learning Approach [62.885572432958504]
Recently directed acyclic graph (DAG) structure learning is formulated as a constrained continuous optimization problem with continuous acyclicity constraints.
We propose a novel learning framework to model and learn the weighted adjacency matrices in the DAG space directly.
We show that our method provides comparable accuracy but better efficiency than baseline DAG structure learning methods on both linear and generalized structural equation models.
arXiv Detail & Related papers (2021-06-14T07:11:36Z) - Solving weakly supervised regression problem using low-rank manifold
regularization [77.34726150561087]
We solve a weakly supervised regression problem.
Under "weakly" we understand that for some training points the labels are known, for some unknown, and for others uncertain due to the presence of random noise or other reasons such as lack of resources.
In the numerical section, we applied the suggested method to artificial and real datasets using Monte-Carlo modeling.
arXiv Detail & Related papers (2021-04-13T23:21:01Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - On the Role of Sparsity and DAG Constraints for Learning Linear DAGs [16.97675762810828]
We study the role of sparsity and DAG constraints for learning DAG models in the linear Gaussian and non-Gaussian cases.
We propose a likelihood-based score function, and show that one only has to apply soft sparsity and DAG constraints to learn a DAG equivalent to the ground truth DAG.
arXiv Detail & Related papers (2020-06-17T23:43:23Z) - Improved guarantees and a multiple-descent curve for Column Subset
Selection and the Nystr\"om method [76.73096213472897]
We develop techniques which exploit spectral properties of the data matrix to obtain improved approximation guarantees.
Our approach leads to significantly better bounds for datasets with known rates of singular value decay.
We show that both our improved bounds and the multiple-descent curve can be observed on real datasets simply by varying the RBF parameter.
arXiv Detail & Related papers (2020-02-21T00:43:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.