On the Role of Entropy-based Loss for Learning Causal Structures with
Continuous Optimization
- URL: http://arxiv.org/abs/2106.02835v4
- Date: Mon, 30 Oct 2023 02:21:21 GMT
- Title: On the Role of Entropy-based Loss for Learning Causal Structures with
Continuous Optimization
- Authors: Weilin Chen, Jie Qiao, Ruichu Cai, Zhifeng Hao
- Abstract summary: A method with non-combinatorial directed acyclic constraint, called NOTEARS, formulates the causal structure learning problem as a continuous optimization problem using least-square loss.
We show that the violation of the Gaussian noise assumption will hinder the causal direction identification.
We propose a more general entropy-based loss that is theoretically consistent with the likelihood score under any noise distribution.
- Score: 27.613220411996025
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Causal discovery from observational data is an important but challenging task
in many scientific fields. Recently, a method with non-combinatorial directed
acyclic constraint, called NOTEARS, formulates the causal structure learning
problem as a continuous optimization problem using least-square loss. Though
the least-square loss function is well justified under the standard Gaussian
noise assumption, it is limited if the assumption does not hold. In this work,
we theoretically show that the violation of the Gaussian noise assumption will
hinder the causal direction identification, making the causal orientation fully
determined by the causal strength as well as the variances of noises in the
linear case and by the strong non-Gaussian noises in the nonlinear case.
Consequently, we propose a more general entropy-based loss that is
theoretically consistent with the likelihood score under any noise
distribution. We run extensive empirical evaluations on both synthetic data and
real-world data to validate the effectiveness of the proposed method and show
that our method achieves the best in Structure Hamming Distance, False
Discovery Rate, and True Positive Rate matrices.
Related papers
- Causal Discovery with Score Matching on Additive Models with Arbitrary
Noise [37.13308785728276]
Causal discovery methods are intrinsically constrained by the set of assumptions needed to ensure structure identifiability.
In this paper we show the shortcomings of inference under this hypothesis, analyzing the risk of edge inversion under violation of Gaussianity of the noise terms.
We propose a novel method for inferring the topological ordering of the variables in the causal graph, from data generated according to an additive non-linear model with a generic noise distribution.
This leads to NoGAM, a causal discovery algorithm with a minimal set of assumptions and state of the art performance, experimentally benchmarked on synthetic data.
arXiv Detail & Related papers (2023-04-06T17:50:46Z) - Optimizing the Noise in Self-Supervised Learning: from Importance
Sampling to Noise-Contrastive Estimation [80.07065346699005]
It is widely assumed that the optimal noise distribution should be made equal to the data distribution, as in Generative Adversarial Networks (GANs)
We turn to Noise-Contrastive Estimation which grounds this self-supervised task as an estimation problem of an energy-based model of the data.
We soberly conclude that the optimal noise may be hard to sample from, and the gain in efficiency can be modest compared to choosing the noise distribution equal to the data's.
arXiv Detail & Related papers (2023-01-23T19:57:58Z) - Robust Matrix Completion with Heavy-tailed Noise [0.5837881923712392]
This paper studies low-rank matrix completion in the presence of heavy-tailed possibly asymmetric noise.
In this paper, we adopt adaptive Huber loss accommodate heavy-tailed noise, which is robust against large and possibly asymmetric errors.
We prove that under merely a second moment condition on the error, the Euclidean error falls geometrically fast until achieving a minimax-optimal statistical estimation error.
arXiv Detail & Related papers (2022-06-09T04:48:48Z) - The Optimal Noise in Noise-Contrastive Learning Is Not What You Think [80.07065346699005]
We show that deviating from this assumption can actually lead to better statistical estimators.
In particular, the optimal noise distribution is different from the data's and even from a different family.
arXiv Detail & Related papers (2022-03-02T13:59:20Z) - Partial Identification with Noisy Covariates: A Robust Optimization
Approach [94.10051154390237]
Causal inference from observational datasets often relies on measuring and adjusting for covariates.
We show that this robust optimization approach can extend a wide range of causal adjustment methods to perform partial identification.
Across synthetic and real datasets, we find that this approach provides ATE bounds with a higher coverage probability than existing methods.
arXiv Detail & Related papers (2022-02-22T04:24:26Z) - Optimizing Information-theoretical Generalization Bounds via Anisotropic
Noise in SGLD [73.55632827932101]
We optimize the information-theoretical generalization bound by manipulating the noise structure in SGLD.
We prove that with constraint to guarantee low empirical risk, the optimal noise covariance is the square root of the expected gradient covariance.
arXiv Detail & Related papers (2021-10-26T15:02:27Z) - False Correlation Reduction for Offline Reinforcement Learning [115.11954432080749]
We propose falSe COrrelation REduction (SCORE) for offline RL, a practically effective and theoretically provable algorithm.
We empirically show that SCORE achieves the SoTA performance with 3.1x acceleration on various tasks in a standard benchmark (D4RL)
arXiv Detail & Related papers (2021-10-24T15:34:03Z) - Analyzing and Improving the Optimization Landscape of Noise-Contrastive
Estimation [50.85788484752612]
Noise-contrastive estimation (NCE) is a statistically consistent method for learning unnormalized probabilistic models.
It has been empirically observed that the choice of the noise distribution is crucial for NCE's performance.
In this work, we formally pinpoint reasons for NCE's poor performance when an inappropriate noise distribution is used.
arXiv Detail & Related papers (2021-10-21T16:57:45Z) - Noisy Gradient Descent Converges to Flat Minima for Nonconvex Matrix
Factorization [36.182992409810446]
This paper investigates the importance of noise in non optimization problems.
We show that gradient descent can converge to any global form that converges to a global bias that is determined by the injected noise.
arXiv Detail & Related papers (2021-02-24T17:50:17Z) - Causal Inference Using Linear Time-Varying Filters with Additive Noise [18.35147325731821]
Causal inference using the restricted structural causal model framework hinges largely on the asymmetry between cause and effect from the data generating mechanisms.
We propose to break the symmetry by exploiting the nonstationarity of the data.
Our main theoretical result shows that the causal direction is identifiable in generic cases when cause and effect are connected via a time-varying filter.
arXiv Detail & Related papers (2020-12-23T23:35:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.