A Geometric Notion of Causal Probing
- URL: http://arxiv.org/abs/2307.15054v3
- Date: Sat, 24 Feb 2024 19:53:58 GMT
- Title: A Geometric Notion of Causal Probing
- Authors: Cl\'ement Guerner, Anej Svete, Tianyu Liu, Alexander Warstadt, Ryan
Cotterell
- Abstract summary: In a language model's representation space, all information about a concept such as verbal number is encoded in a linear subspace.
We give a set of intrinsic criteria which characterize an ideal linear concept subspace.
We find that LEACE returns a one-dimensional subspace containing roughly half of total concept information.
- Score: 91.14470073637236
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The linear subspace hypothesis (Bolukbasi et al., 2016) states that, in a
language model's representation space, all information about a concept such as
verbal number is encoded in a linear subspace. Prior work has relied on
auxiliary classification tasks to identify and evaluate candidate subspaces
that might give support for this hypothesis. We instead give a set of intrinsic
criteria which characterize an ideal linear concept subspace and enable us to
identify the subspace using only the language model distribution. Our
information-theoretic framework accounts for spuriously correlated features in
the representation space (Kumar et al., 2022). As a byproduct of this analysis,
we hypothesize a causal process for how a language model might leverage
concepts during generation. Empirically, we find that LEACE (Belrose et al.,
2023) returns a one-dimensional subspace containing roughly half of total
concept information under our framework for verbal-number. Our causal
intervention for controlled generation shows that, for at least one concept,
the subspace returned by LEACE can be used to manipulate the concept value of
the generated word with precision.
Related papers
- LASERS: LAtent Space Encoding for Representations with Sparsity for Generative Modeling [3.9426000822656224]
We show that our more latent space is more expressive and has leads to better representations than the Vector Quantization approach.
Our results thus suggest that the true benefit of the VQ approach might not be from discretization of the latent space, but rather the lossy compression of the latent space.
arXiv Detail & Related papers (2024-09-16T08:20:58Z) - LEACE: Perfect linear concept erasure in closed form [103.61624393221447]
Concept erasure aims to remove specified features from a representation.
We introduce LEAst-squares Concept Erasure (LEACE), a closed-form method which provably prevents all linear classifiers from detecting a concept while changing the representation as little as possible.
We apply LEACE to large language models with a novel procedure called "concept scrubbing," which erases target concept information from every layer in the network.
arXiv Detail & Related papers (2023-06-06T16:07:24Z) - Concept Activation Regions: A Generalized Framework For Concept-Based
Explanations [95.94432031144716]
Existing methods assume that the examples illustrating a concept are mapped in a fixed direction of the deep neural network's latent space.
In this work, we propose allowing concept examples to be scattered across different clusters in the DNN's latent space.
This concept activation region (CAR) formalism yields global concept-based explanations and local concept-based feature importance.
arXiv Detail & Related papers (2022-09-22T17:59:03Z) - Interpreting Embedding Spaces by Conceptualization [2.620130580437745]
We present a novel method of understanding embeddings by transforming a latent embedding space into a comprehensible conceptual space.
We devise a new evaluation method, using either human rater or LLM-based raters, to show that the vectors indeed represent the semantics of the original latent ones.
arXiv Detail & Related papers (2022-08-22T15:32:17Z) - Subspace-based Representation and Learning for Phonotactic Spoken
Language Recognition [27.268047798971473]
We propose a new learning mechanism based on subspace-based representation.
It can extract concealed phonotactic structures from utterances for language verification and dialect/accent identification.
The proposed method achieved up to 52%, 46%, 56%, and 27% relative reductions in equal error rates over the sequence-based PPR-LM, PPR-VSM, and PPR-IVEC methods.
arXiv Detail & Related papers (2022-03-28T07:01:45Z) - Implicit Bias of Projected Subgradient Method Gives Provable Robust
Recovery of Subspaces of Unknown Codimension [12.354076490479514]
We show that Dual Principal Component Pursuit (DPCP) can provably solve problems in the it unknown subspace dimension regime.
We propose a very simple algorithm based on running multiple instances of a projected sub-gradient descent method (PSGM)
In particular, we show that 1) all of the problem instances will converge to a vector in the nullspace of the subspace and 2) the ensemble of problem instance solutions will be sufficiently diverse to fully span the nullspace of the subspace.
arXiv Detail & Related papers (2022-01-22T15:36:03Z) - Introducing Orthogonal Constraint in Structural Probes [0.2538209532048867]
We decompose a linear projection of language vector space into isomorphic space rotation and linear scaling directions.
We experimentally show that our approach can be performed in a multitask setting.
arXiv Detail & Related papers (2020-12-30T17:14:25Z) - Stochastic Linear Bandits with Protected Subspace [51.43660657268171]
We study a variant of the linear bandit problem wherein we optimize a linear objective function but rewards are accrued only to an unknown subspace.
In particular, at each round, the learner must choose whether to query the objective or the protected subspace alongside choosing an action.
Our algorithm, derived from the OFUL principle, uses some of the queries to get an estimate of the protected space.
arXiv Detail & Related papers (2020-11-02T14:59:39Z) - Joint and Progressive Subspace Analysis (JPSA) with Spatial-Spectral
Manifold Alignment for Semi-Supervised Hyperspectral Dimensionality Reduction [48.73525876467408]
We propose a novel technique for hyperspectral subspace analysis.
The technique is called joint and progressive subspace analysis (JPSA)
Experiments are conducted to demonstrate the superiority and effectiveness of the proposed JPSA on two widely-used hyperspectral datasets.
arXiv Detail & Related papers (2020-09-21T16:29:59Z) - APo-VAE: Text Generation in Hyperbolic Space [116.11974607497986]
In this paper, we investigate text generation in a hyperbolic latent space to learn continuous hierarchical representations.
An Adrial Poincare Variversaational Autoencoder (APo-VAE) is presented, where both the prior and variational posterior of latent variables are defined over a Poincare ball via wrapped normal distributions.
Experiments in language modeling and dialog-response generation tasks demonstrate the winning effectiveness of the proposed APo-VAE model.
arXiv Detail & Related papers (2020-04-30T19:05:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.