Related papers: Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations

Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations

URL: http://arxiv.org/abs/2303.02536v4
Date: Wed, 21 Feb 2024 23:23:18 GMT
Title: Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations
Authors: Atticus Geiger and Zhengxuan Wu and Christopher Potts and Thomas Icard and Noah D. Goodman
Abstract summary: Causal abstraction is a promising theoretical framework for explainable artificial intelligence. Existing causal abstraction methods require a brute-force search over alignments between the high-level model and the low-level one. We present distributed alignment search (DAS), which overcomes these limitations.
Score: 62.65877150123775
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Causal abstraction is a promising theoretical framework for explainable artificial intelligence that defines when an interpretable high-level causal model is a faithful simplification of a low-level deep learning system. However, existing causal abstraction methods have two major limitations: they require a brute-force search over alignments between the high-level model and the low-level one, and they presuppose that variables in the high-level model will align with disjoint sets of neurons in the low-level one. In this paper, we present distributed alignment search (DAS), which overcomes these limitations. In DAS, we find the alignment between high-level and low-level models using gradient descent rather than conducting a brute-force search, and we allow individual neurons to play multiple distinct roles by analyzing representations in non-standard bases-distributed representations. Our experiments show that DAS can discover internal structure that prior approaches miss. Overall, DAS removes previous obstacles to conducting causal abstraction analyses and allows us to find conceptual structure in trained neural nets.

Related papers

Constrained Auto-Regressive Decoding Constrains Generative Retrieval [71.71161220261655]
Generative retrieval seeks to replace traditional search index data structures with a single large-scale neural network. In this paper, we examine the inherent limitations of constrained auto-regressive generation from two essential perspectives: constraints and beam search.
arXiv Detail & Related papers (2025-04-14T06:54:49Z)
An Algebraic Framework for Hierarchical Probabilistic Abstraction [5.455744338342196]
We introduce a hierarchical probabilistic abstraction framework aimed at addressing challenges by extending a measure-theoretic foundation for hierarchical abstraction. This approach bridges high-level conceptualization with low-level perceptual data, enhancing interpretability and allowing layered analysis. Our framework provides a robust foundation for abstraction analysis across AI subfields, particularly in aligning System 1 and System 2 thinking.
arXiv Detail & Related papers (2025-02-28T16:47:42Z)
Causality can systematically address the monsters under the bench(marks) [64.36592889550431]
Benchmarks are plagued by various biases, artifacts, or leakage. Models may behave unreliably due to poorly explored failure modes. causality offers an ideal framework to systematically address these challenges.
arXiv Detail & Related papers (2025-02-07T17:01:37Z)
Learning Causal Abstractions of Linear Structural Causal Models [18.132607344833925]
Causal Abstraction provides a framework for formalizing two Structural Causal Models at different levels of detail. We tackle both issues for linear causal models with linear abstraction functions. In particular, we introduce Abs-LiNGAM, a method that leverages the constraints induced by the learned high-level model and the abstraction function to speedup the recovery of the larger low-level model.
arXiv Detail & Related papers (2024-06-01T10:42:52Z)
The twin peaks of learning neural networks [3.382017614888546]
Recent works demonstrated the existence of a double-descent phenomenon for the generalization error of neural networks. We explore a link between this phenomenon and the increase of complexity and sensitivity of the function represented by neural networks.
arXiv Detail & Related papers (2024-01-23T10:09:14Z)
Causal Triplet: An Open Challenge for Intervention-centric Causal Representation Learning [98.78136504619539]
Causal Triplet is a causal representation learning benchmark featuring visually more complex scenes. We show that models built with the knowledge of disentangled or object-centric representations significantly outperform their distributed counterparts.
arXiv Detail & Related papers (2023-01-12T17:43:38Z)
Unifying Causal Inference and Reinforcement Learning using Higher-Order Category Theory [4.119151469153588]
We present a unified formalism for structure discovery of causal models and predictive state representation models in reinforcement learning. Specifically, we model structure discovery in both settings using simplicial objects.
arXiv Detail & Related papers (2022-09-13T19:04:18Z)
Systematic Evaluation of Causal Discovery in Visual Model Based Reinforcement Learning [76.00395335702572]
A central goal for AI and causality is the joint discovery of abstract representations and causal structure. Existing environments for studying causal induction are poorly suited for this objective because they have complicated task-specific causal graphs. In this work, our goal is to facilitate research in learning representations of high-level variables as well as causal structures among them.
arXiv Detail & Related papers (2021-07-02T05:44:56Z)
ACRE: Abstract Causal REasoning Beyond Covariation [90.99059920286484]
We introduce the Abstract Causal REasoning dataset for systematic evaluation of current vision systems in causal induction. Motivated by the stream of research on causal discovery in Blicket experiments, we query a visual reasoning system with the following four types of questions in either an independent scenario or an interventional scenario. We notice that pure neural models tend towards an associative strategy under their chance-level performance, whereas neuro-symbolic combinations struggle in backward-blocking reasoning.
arXiv Detail & Related papers (2021-03-26T02:42:38Z)
Gradient Starvation: A Learning Proclivity in Neural Networks [97.02382916372594]
Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task. This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks.
arXiv Detail & Related papers (2020-11-18T18:52:08Z)
Decontextualized learning for interpretable hierarchical representations of visual patterns [0.0]
We present an algorithm and training paradigm designed specifically to address this: decontextualized hierarchical representation learning (DHRL) DHRL address the limitations of small datasets and encourages a disentangled set of hierarchically organized features. In addition to providing a tractable path for analyzing complex hierarchal patterns using variation inference, this approach is generative and can be directly combined with empirical and theoretical approaches.
arXiv Detail & Related papers (2020-08-31T14:47:55Z)
Local Propagation in Constraint-based Neural Network [77.37829055999238]
We study a constraint-based representation of neural network architectures. We investigate a simple optimization procedure that is well suited to fulfil the so-called architectural constraints.
arXiv Detail & Related papers (2020-02-18T16:47:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.