A Geometric Notion of Causal Probing
- URL: http://arxiv.org/abs/2307.15054v4
- Date: Wed, 26 Mar 2025 16:33:43 GMT
- Title: A Geometric Notion of Causal Probing
- Authors: Clément Guerner, Tianyu Liu, Anej Svete, Alexander Warstadt, Ryan Cotterell,
- Abstract summary: The linear subspace hypothesis states that, in a language model's representation space, all information about a concept such as verbal number is encoded in a linear subspace.<n>We give a set of intrinsic criteria which characterize an ideal linear concept subspace.<n>We find that, for at least one concept across two languages models, the concept subspace can be used to manipulate the concept value of the generated word with precision.
- Score: 85.49839090913515
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The linear subspace hypothesis (Bolukbasi et al., 2016) states that, in a language model's representation space, all information about a concept such as verbal number is encoded in a linear subspace. Prior work has relied on auxiliary classification tasks to identify and evaluate candidate subspaces that might give support for this hypothesis. We instead give a set of intrinsic criteria which characterize an ideal linear concept subspace and enable us to identify the subspace using only the language model distribution. Our information-theoretic framework accounts for spuriously correlated features in the representation space (Kumar et al., 2022) by reconciling the statistical notion of concept information and the geometric notion of how concepts are encoded in the representation space. As a byproduct of this analysis, we hypothesize a causal process for how a language model might leverage concepts during generation. Empirically, we find that linear concept erasure is successful in erasing most concept information under our framework for verbal number as well as some complex aspect-level sentiment concepts from a restaurant review dataset. Our causal intervention for controlled generation shows that, for at least one concept across two languages models, the concept subspace can be used to manipulate the concept value of the generated word with precision.
Related papers
- LASERS: LAtent Space Encoding for Representations with Sparsity for Generative Modeling [3.9426000822656224]
We show that our more latent space is more expressive and has leads to better representations than the Vector Quantization approach.
Our results thus suggest that the true benefit of the VQ approach might not be from discretization of the latent space, but rather the lossy compression of the latent space.
arXiv Detail & Related papers (2024-09-16T08:20:58Z) - The Geometry of Categorical and Hierarchical Concepts in Large Language Models [15.126806053878855]
We show how to extend the formalization of the linear representation hypothesis to represent features (e.g., is_animal) as vectors.
We use the formalization to prove a relationship between the hierarchical structure of concepts and the geometry of their representations.
We validate these theoretical results on the Gemma and LLaMA-3 large language models, estimating representations for 900+ hierarchically related concepts using data from WordNet.
arXiv Detail & Related papers (2024-06-03T16:34:01Z) - Implicit Concept Removal of Diffusion Models [92.55152501707995]
Text-to-image (T2I) diffusion models often inadvertently generate unwanted concepts such as watermarks and unsafe images.
We present the Geom-Erasing, a novel concept removal method based on the geometric-driven control.
arXiv Detail & Related papers (2023-10-09T17:13:10Z) - LEACE: Perfect linear concept erasure in closed form [103.61624393221447]
Concept erasure aims to remove specified features from a representation.
We introduce LEAst-squares Concept Erasure (LEACE), a closed-form method which provably prevents all linear classifiers from detecting a concept while changing the representation as little as possible.
We apply LEACE to large language models with a novel procedure called "concept scrubbing," which erases target concept information from every layer in the network.
arXiv Detail & Related papers (2023-06-06T16:07:24Z) - ConceptX: A Framework for Latent Concept Analysis [21.760620298330235]
We present ConceptX, a human-in-the-loop framework for interpreting and annotating latent representational space in Language Models (pLMs)
We use an unsupervised method to discover concepts learned in these models and enable a graphical interface for humans to generate explanations for the concepts.
arXiv Detail & Related papers (2022-11-12T11:31:09Z) - Concept Activation Regions: A Generalized Framework For Concept-Based
Explanations [95.94432031144716]
Existing methods assume that the examples illustrating a concept are mapped in a fixed direction of the deep neural network's latent space.
In this work, we propose allowing concept examples to be scattered across different clusters in the DNN's latent space.
This concept activation region (CAR) formalism yields global concept-based explanations and local concept-based feature importance.
arXiv Detail & Related papers (2022-09-22T17:59:03Z) - Interpreting Embedding Spaces by Conceptualization [2.620130580437745]
We present a novel method of understanding embeddings by transforming a latent embedding space into a comprehensible conceptual space.
We devise a new evaluation method, using either human rater or LLM-based raters, to show that the vectors indeed represent the semantics of the original latent ones.
arXiv Detail & Related papers (2022-08-22T15:32:17Z) - Overlooked factors in concept-based explanations: Dataset choice,
concept learnability, and human capability [25.545486537295144]
Concept-based interpretability methods aim to explain deep neural network model predictions using a predefined set of semantic concepts.
Despite their popularity, they suffer from limitations that are not well-understood and articulated by the literature.
We analyze three commonly overlooked factors in concept-based explanations.
arXiv Detail & Related papers (2022-07-20T01:59:39Z) - Subspace-based Representation and Learning for Phonotactic Spoken
Language Recognition [27.268047798971473]
We propose a new learning mechanism based on subspace-based representation.
It can extract concealed phonotactic structures from utterances for language verification and dialect/accent identification.
The proposed method achieved up to 52%, 46%, 56%, and 27% relative reductions in equal error rates over the sequence-based PPR-LM, PPR-VSM, and PPR-IVEC methods.
arXiv Detail & Related papers (2022-03-28T07:01:45Z) - Sparse Subspace Clustering for Concept Discovery (SSCCD) [1.7319807100654885]
Concepts are key building blocks of higher level human understanding.
Local attribution methods do not allow to identify coherent model behavior across samples.
We put forward a new definition of concepts as low-dimensional subspaces of hidden feature layers.
arXiv Detail & Related papers (2022-03-11T16:15:48Z) - Kernelized Concept Erasure [108.65038124096907]
We propose a kernelization of a linear minimax game for concept erasure.
It is possible to prevent specific non-linear adversaries from predicting the concept.
However, the protection does not transfer to different nonlinear adversaries.
arXiv Detail & Related papers (2022-01-28T15:45:13Z) - Implicit Bias of Projected Subgradient Method Gives Provable Robust
Recovery of Subspaces of Unknown Codimension [12.354076490479514]
We show that Dual Principal Component Pursuit (DPCP) can provably solve problems in the it unknown subspace dimension regime.
We propose a very simple algorithm based on running multiple instances of a projected sub-gradient descent method (PSGM)
In particular, we show that 1) all of the problem instances will converge to a vector in the nullspace of the subspace and 2) the ensemble of problem instance solutions will be sufficiently diverse to fully span the nullspace of the subspace.
arXiv Detail & Related papers (2022-01-22T15:36:03Z) - Formalising Concepts as Grounded Abstractions [68.24080871981869]
This report shows how representation learning can be used to induce concepts from raw data.
The main technical goal of this report is to show how techniques from representation learning can be married with a lattice-theoretic formulation of conceptual spaces.
arXiv Detail & Related papers (2021-01-13T15:22:01Z) - Introducing Orthogonal Constraint in Structural Probes [0.2538209532048867]
We decompose a linear projection of language vector space into isomorphic space rotation and linear scaling directions.
We experimentally show that our approach can be performed in a multitask setting.
arXiv Detail & Related papers (2020-12-30T17:14:25Z) - Stochastic Linear Bandits with Protected Subspace [51.43660657268171]
We study a variant of the linear bandit problem wherein we optimize a linear objective function but rewards are accrued only to an unknown subspace.
In particular, at each round, the learner must choose whether to query the objective or the protected subspace alongside choosing an action.
Our algorithm, derived from the OFUL principle, uses some of the queries to get an estimate of the protected space.
arXiv Detail & Related papers (2020-11-02T14:59:39Z) - Joint and Progressive Subspace Analysis (JPSA) with Spatial-Spectral
Manifold Alignment for Semi-Supervised Hyperspectral Dimensionality Reduction [48.73525876467408]
We propose a novel technique for hyperspectral subspace analysis.
The technique is called joint and progressive subspace analysis (JPSA)
Experiments are conducted to demonstrate the superiority and effectiveness of the proposed JPSA on two widely-used hyperspectral datasets.
arXiv Detail & Related papers (2020-09-21T16:29:59Z) - Space of Reasons and Mathematical Model [8.475081627511166]
Inferential relations govern our concept use.
In order to understand a concept it has to be located in a space of implications.
The crucial questions is: How can the conditionality of language use be represented.
arXiv Detail & Related papers (2020-07-06T01:13:43Z) - APo-VAE: Text Generation in Hyperbolic Space [116.11974607497986]
In this paper, we investigate text generation in a hyperbolic latent space to learn continuous hierarchical representations.
An Adrial Poincare Variversaational Autoencoder (APo-VAE) is presented, where both the prior and variational posterior of latent variables are defined over a Poincare ball via wrapped normal distributions.
Experiments in language modeling and dialog-response generation tasks demonstrate the winning effectiveness of the proposed APo-VAE model.
arXiv Detail & Related papers (2020-04-30T19:05:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.