Related papers: CASTLE: Regularization via Auxiliary Causal Graph Discovery

CASTLE: Regularization via Auxiliary Causal Graph Discovery

URL: http://arxiv.org/abs/2009.13180v1
Date: Mon, 28 Sep 2020 09:49:38 GMT
Title: CASTLE: Regularization via Auxiliary Causal Graph Discovery
Authors: Trent Kyono, Yao Zhang, Mihaela van der Schaar
Abstract summary: We introduce Causal Structure Learning (CASTLE) regularization and propose to regularize a neural network by jointly learning the causal relationships between variables. CASTLE efficiently reconstructs only the features in the causal DAG that have a causal neighbor, whereas reconstruction-based regularizers suboptimally reconstruct all input features.
Score: 89.74800176981842
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Regularization improves generalization of supervised models to out-of-sample data. Prior works have shown that prediction in the causal direction (effect from cause) results in lower testing error than the anti-causal direction. However, existing regularization methods are agnostic of causality. We introduce Causal Structure Learning (CASTLE) regularization and propose to regularize a neural network by jointly learning the causal relationships between variables. CASTLE learns the causal directed acyclical graph (DAG) as an adjacency matrix embedded in the neural network's input layers, thereby facilitating the discovery of optimal predictors. Furthermore, CASTLE efficiently reconstructs only the features in the causal DAG that have a causal neighbor, whereas reconstruction-based regularizers suboptimally reconstruct all input features. We provide a theoretical generalization bound for our approach and conduct experiments on a plethora of synthetic and real publicly available datasets demonstrating that CASTLE consistently leads to better out-of-sample predictions as compared to other popular benchmark regularizers.

Related papers

An In-depth Investigation of Sparse Rate Reduction in Transformer-like Models [32.04194224236952]
We propose an information-theoretic objective function called Sparse Rate Reduction (SRR) We show that SRR has a positive correlation coefficient and outperforms other baseline measures, such as path-norm and sharpness-based ones. We show that generalization can be improved using SRR as regularization on benchmark image classification datasets.
arXiv Detail & Related papers (2024-11-26T07:44:57Z)
Induced Covariance for Causal Discovery in Linear Sparse Structures [55.2480439325792]
Causal models seek to unravel the cause-effect relationships among variables from observed data. This paper introduces a novel causal discovery algorithm designed for settings in which variables exhibit linearly sparse relationships.
arXiv Detail & Related papers (2024-10-02T04:01:38Z)
Sample, estimate, aggregate: A recipe for causal discovery foundation models [28.116832159265964]
We train a supervised model that learns to predict a larger causal graph from the outputs of classical causal discovery algorithms run over subsets of variables. Our approach is enabled by the observation that typical errors in the outputs of classical methods remain comparable across datasets. Experiments on real and synthetic data demonstrate that this model maintains high accuracy in the face of misspecification or distribution shift.
arXiv Detail & Related papers (2024-02-02T21:57:58Z)
A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime. We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z)
dotears: Scalable, consistent DAG estimation using observational and interventional data [1.220743263007369]
Causal gene regulatory networks can be represented by directed acyclic graph (DAG) We present $texttdotears$ [doo-tairs], a continuous optimization framework to infer a single causal structure. We show that $texttdotears$ is a provably consistent estimator of the true DAG under mild assumptions.
arXiv Detail & Related papers (2023-05-30T17:03:39Z)
Spot The Odd One Out: Regularized Complete Cycle Consistent Anomaly Detector GAN [4.5123329001179275]
This study presents an adversarial method for anomaly detection in real-world applications, leveraging the power of generative adversarial neural networks (GANs) Previous methods suffer from the high variance between class-wise accuracy which leads to not being applicable for all types of anomalies. The proposed method named RCALAD tries to solve this problem by introducing a novel discriminator to the structure, which results in a more efficient training process.
arXiv Detail & Related papers (2023-04-16T13:05:39Z)
Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data. We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z)
Amortized Inference for Causal Structure Learning [72.84105256353801]
Learning causal structure poses a search problem that typically involves evaluating structures using a score or independence test. We train a variational inference model to predict the causal structure from observational/interventional data. Our models exhibit robust generalization capabilities under substantial distribution shift.
arXiv Detail & Related papers (2022-05-25T17:37:08Z)
Prequential MDL for Causal Structure Learning with Neural Networks [9.669269791955012]
We show that the prequential minimum description length principle can be used to derive a practical scoring function for Bayesian networks. We obtain plausible and parsimonious graph structures without relying on sparsity inducing priors or other regularizers which must be tuned. We discuss how the the prequential score relates to recent work that infers causal structure from the speed of adaptation when the observations come from a source undergoing distributional shift.
arXiv Detail & Related papers (2021-07-02T22:35:21Z)
When Does Preconditioning Help or Hurt Generalization? [74.25170084614098]
We show how the textitimplicit bias of first and second order methods affects the comparison of generalization properties. We discuss several approaches to manage the bias-variance tradeoff, and the potential benefit of interpolating between GD and NGD.
arXiv Detail & Related papers (2020-06-18T17:57:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.