Related papers: Causal Abstractions of Neural Networks

Causal Abstractions of Neural Networks

URL: http://arxiv.org/abs/2106.02997v1
Date: Sun, 6 Jun 2021 01:07:43 GMT
Title: Causal Abstractions of Neural Networks
Authors: Atticus Geiger, Hanson Lu, Thomas Icard, Christopher Potts
Abstract summary: We propose a new structural analysis method grounded in a formal theory of textitcausal abstraction. We apply this method to analyze neural models trained on Multiply Quantified Natural Language Inference (MQNLI) corpus.
Score: 9.291492712301569
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Structural analysis methods (e.g., probing and feature attribution) are increasingly important tools for neural network analysis. We propose a new structural analysis method grounded in a formal theory of \textit{causal abstraction} that provides rich characterizations of model-internal representations and their roles in input/output behavior. In this method, neural representations are aligned with variables in interpretable causal models, and then \textit{interchange interventions} are used to experimentally verify that the neural representations have the causal properties of their aligned variables. We apply this method in a case study to analyze neural models trained on Multiply Quantified Natural Language Inference (MQNLI) corpus, a highly complex NLI dataset that was constructed with a tree-structured natural logic causal model. We discover that a BERT-based model with state-of-the-art performance successfully realizes the approximate causal structure of the natural logic causal model, whereas a simpler baseline model fails to show any such structure, demonstrating that neural representations encode the compositional structure of MQNLI examples.

Related papers

An XAI-based Analysis of Shortcut Learning in Neural Networks [2.592470112714595]
We introduce the neuron spurious score to quantify a neuron's dependence on spurious features. Our results show that spurious features are partially disentangled, but the degree of disentanglement varies across model architectures. Our results lay the groundwork for the development of novel methods to mitigate spurious correlations and make AI models safer to use in practice.
arXiv Detail & Related papers (2025-04-22T07:40:45Z)
Structure of Artificial Neural Networks -- Empirical Investigations [0.0]
Within one decade, Deep Learning overtook the dominating solution methods of countless problems of artificial intelligence. With a formal definition for structures of neural networks, neural architecture search problems and solution methods can be formulated under a common framework. Does structure make a difference or can it be chosen arbitrarily?
arXiv Detail & Related papers (2024-10-12T16:13:28Z)
Hidden Holes: topological aspects of language models [1.1172147007388977]
We study the evolution of topological structure in GPT based large language models across depth and time during training. We show that the latter exhibit more topological complexity, with a distinct pattern of changes common to all natural languages but absent from synthetically generated data.
arXiv Detail & Related papers (2024-06-09T14:25:09Z)
Consistency of Neural Causal Partial Identification [17.503562318576414]
Recent progress in Causal Models showcased how identification and partial identification of causal effects can be automatically carried out via neural generative models. We prove consistency of partial identification via NCMs in a general setting with both continuous and categorical variables. Results highlight the impact of the design of the underlying neural network architecture in terms of depth and connectivity.
arXiv Detail & Related papers (2024-05-24T16:12:39Z)
LOGICSEG: Parsing Visual Semantics with Neural Logic Learning and Reasoning [73.98142349171552]
LOGICSEG is a holistic visual semantic that integrates neural inductive learning and logic reasoning with both rich data and symbolic knowledge. During fuzzy logic-based continuous relaxation, logical formulae are grounded onto data and neural computational graphs, hence enabling logic-induced network training. These designs together make LOGICSEG a general and compact neural-logic machine that is readily integrated into existing segmentation models.
arXiv Detail & Related papers (2023-09-24T05:43:19Z)
On the Trade-off Between Efficiency and Precision of Neural Abstraction [62.046646433536104]
Neural abstractions have been recently introduced as formal approximations of complex, nonlinear dynamical models. We employ formal inductive synthesis procedures to generate neural abstractions that result in dynamical models with these semantics.
arXiv Detail & Related papers (2023-07-28T13:22:32Z)
On the Generalization and Adaption Performance of Causal Models [99.64022680811281]
Differentiable causal discovery has proposed to factorize the data generating process into a set of modules. We study the generalization and adaption performance of such modular neural causal models. Our analysis shows that the modular neural causal models outperform other models on both zero and few-shot adaptation in low data regimes.
arXiv Detail & Related papers (2022-06-09T17:12:32Z)
Amortized Inference for Causal Structure Learning [72.84105256353801]
Learning causal structure poses a search problem that typically involves evaluating structures using a score or independence test. We train a variational inference model to predict the causal structure from observational/interventional data. Our models exhibit robust generalization capabilities under substantial distribution shift.
arXiv Detail & Related papers (2022-05-25T17:37:08Z)
A Physics-Guided Neural Operator Learning Approach to Model Biological Tissues from Digital Image Correlation Measurements [3.65211252467094]
We present a data-driven correlation to biological tissue modeling, which aims to predict the displacement field based on digital image correlation (DIC) measurements under unseen loading scenarios. A material database is constructed from the DIC displacement tracking measurements of multiple biaxial stretching protocols on a porcine tricuspid valve leaflet. The material response is modeled as a solution operator from the loading to the resultant displacement field, with the material properties learned implicitly from the data and naturally embedded in the network parameters.
arXiv Detail & Related papers (2022-04-01T04:56:41Z)
The Causal Neural Connection: Expressiveness, Learnability, and Inference [125.57815987218756]
An object called structural causal model (SCM) represents a collection of mechanisms and sources of random variation of the system under investigation. In this paper, we show that the causal hierarchy theorem (Thm. 1, Bareinboim et al., 2020) still holds for neural models. We introduce a special type of SCM called a neural causal model (NCM), and formalize a new type of inductive bias to encode structural constraints necessary for performing causal inferences.
arXiv Detail & Related papers (2021-07-02T01:55:18Z)
Exploring End-to-End Differentiable Natural Logic Modeling [21.994060519995855]
We explore end-to-end trained differentiable models that integrate natural logic with neural networks. The proposed model adapts module networks to model natural logic operations, which is enhanced with a memory component to model contextual information.
arXiv Detail & Related papers (2020-11-08T18:18:15Z)
Structural Causal Models Are (Solvable by) Credal Networks [70.45873402967297]
Causal inferences can be obtained by standard algorithms for the updating of credal nets. This contribution should be regarded as a systematic approach to represent structural causal models by credal networks. Experiments show that approximate algorithms for credal networks can immediately be used to do causal inference in real-size problems.
arXiv Detail & Related papers (2020-08-02T11:19:36Z)
Provably Efficient Neural Estimation of Structural Equation Model: An Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs) We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent. For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.