Related papers: Unveiling Language Skills via Path-Level Circuit Discovery

Unveiling Language Skills via Path-Level Circuit Discovery

URL: http://arxiv.org/abs/2410.01334v2
Date: Mon, 16 Dec 2024 03:33:36 GMT
Title: Unveiling Language Skills via Path-Level Circuit Discovery
Authors: Hang Chen, Jiaying Zhu, Xinyu Yang, Wenya Wang,
Abstract summary: We propose a novel path-level circuit discovery framework capturing how behaviors emerge through interconnected linear chain.<n>Our framework is constructed upon a fully-disentangled linear combinations of memory circuits'' decomposed from the original model.<n>In contrast to circuit graph from existing works, we focus on the complete paths of a generic skill rather than on the fine-grained responses to individual components of the input.
Score: 31.608080868988825
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Circuit discovery with edge-level ablation has become a foundational framework for mechanism interpretability of language models. However, its focus on individual edges often overlooks the sequential, path-level causal relationships that underpin complex behaviors, thus potentially leading to misleading or incomplete circuit discoveries. To address this issue, we propose a novel path-level circuit discovery framework capturing how behaviors emerge through interconnected linear chain and build towards complex behaviors. Our framework is constructed upon a fully-disentangled linear combinations of ``memory circuits'' decomposed from the original model. To discover functional circuit paths, we leverage a 2-step pruning strategy by first reducing the computational graph to a faithful and minimal subgraph and then applying causal mediation to identify common paths of a specific skill, termed as skill paths. In contrast to circuit graph from existing works, we focus on the complete paths of a generic skill rather than on the fine-grained responses to individual components of the input. To demonstrate this, we explore three generic language skills, namely Previous Token Skill, Induction Skill and In-Context Learning Skill using our framework and provide more compelling evidence to substantiate stratification and inclusiveness of these skills.

Related papers

Rethinking Circuit Completeness in Language Models: AND, OR, and ADDER Gates [31.608080868988825]
We introduce three types of logic gates: AND, OR, and ADDER gates, and decompose the circuit into combinations of these logical gates.<n>We propose a framework that combines noising-based and denoising-based interventions, which can be easily integrated into existing circuit discovery methods.
arXiv Detail & Related papers (2025-05-15T07:35:14Z)
Position-aware Automatic Circuit Discovery [59.64762573617173]
We identify a gap in existing circuit discovery methods, treating model components as equally relevant across input positions. We propose two improvements to incorporate positionality into circuits, even on tasks containing variable-length examples. Our approach enables fully automated discovery of position-sensitive circuits, yielding better trade-offs between circuit size and faithfulness compared to prior work.
arXiv Detail & Related papers (2025-02-07T00:18:20Z)
Navigating Shortcuts, Spurious Correlations, and Confounders: From Origins via Detection to Mitigation [21.21130450731374]
Clever Hans behavior, spurious correlations, or confounders, present a significant challenge in machine learning and AI. Research in this area remains fragmented across various terminologies, hindering the progress of the field as a whole. We introduce a unifying taxonomy by providing a formal definition of shortcuts and bridging the diverse terms used in the literature.
arXiv Detail & Related papers (2024-12-06T16:10:13Z)
Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models [22.89563355840371]
We study the modularity of neural networks by analyzing circuits for highly compositional subtasks within a language model. Our results indicate that functionally similar circuits exhibit both notable node overlap and cross-task faithfulness.
arXiv Detail & Related papers (2024-10-02T11:36:45Z)
Transformer Circuit Faithfulness Metrics are not Robust [0.04260910081285213]
We measure circuit 'faithfulness' by ablating portions of the model's computation. We conclude that existing circuit faithfulness scores reflect both the methodological choices of researchers as well as the actual components of the circuit. The ultimate goal of mechanistic interpretability work is to understand neural networks, so we emphasize the need for more clarity in the precise claims being made about circuits.
arXiv Detail & Related papers (2024-07-11T17:59:00Z)
Functional Faithfulness in the Wild: Circuit Discovery with Differentiable Computation Graph Pruning [14.639036250438517]
We introduce a comprehensive reformulation of the task known as Circuit Discovery, along with DiscoGP. DiscoGP is a novel and effective algorithm based on differentiable masking for discovering circuits.
arXiv Detail & Related papers (2024-07-04T09:42:25Z)
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models [55.19497659895122]
We introduce methods for discovering and applying sparse feature circuits. These are causally implicatedworks of human-interpretable features for explaining language model behaviors.
arXiv Detail & Related papers (2024-03-28T17:56:07Z)
Rethinking Mutual Information for Language Conditioned Skill Discovery on Imitation Learning [36.624923972563415]
We propose an end-to-end imitation learning approach known as Language Conditioned Skill Discovery (LCSD) We utilize vector quantization to learn discrete latent skills and leverage skill sequences of trajectories to reconstruct high-level semantic instructions. Our approach exhibits enhanced generalization capabilities towards unseen tasks, improved skill interpretability, and notably higher rates of task completion success.
arXiv Detail & Related papers (2024-02-27T13:53:52Z)
Are Structural Concepts Universal in Transformer Language Models? Towards Interpretable Cross-Lingual Generalization [27.368684663279463]
We investigate the potential for explicitly aligning conceptual correspondence between languages to enhance cross-lingual generalization. Using the syntactic aspect of language as a testbed, our analyses of 43 languages reveal a high degree of alignability. We propose a meta-learning-based method to learn to align conceptual spaces of different languages.
arXiv Detail & Related papers (2023-10-19T14:50:51Z)
SkillNet-X: A Multilingual Multitask Model with Sparsely Activated Skills [51.74947795895178]
This paper proposes a general multilingual multitask model, named SkillNet-X. We define several language-specific skills and task-specific skills, each of which corresponds to a skill module. We evaluate SkillNet-X on eleven natural language understanding datasets in four languages.
arXiv Detail & Related papers (2023-06-28T12:53:30Z)
Language Is Not All You Need: Aligning Perception with Language Models [110.51362453720458]
We introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context, and follow instructions. We train Kosmos-1 from scratch on web-scale multimodal corpora, including arbitrarily interleaved text and images, image-caption pairs, and text data. Experimental results show that Kosmos-1 achieves impressive performance on (i) language understanding, generation, and even OCR-free NLP. We also show that MLLMs can benefit from cross-modal transfer, i.e., transfer knowledge from language to multimodal, and from multimodal to language
arXiv Detail & Related papers (2023-02-27T18:55:27Z)
Grounding Language with Visual Affordances over Unstructured Data [26.92329260907805]
We propose a novel approach to efficiently learn language-conditioned robot skills from unstructured, offline and reset-free data. We exploit a self-supervised visuo-lingual affordance model, which requires as little as 1% of the total data with language. We find that our method is capable of completing long-horizon, multi-tier tasks in the real world, while requiring an order of magnitude less data than previous approaches.
arXiv Detail & Related papers (2022-10-04T21:16:48Z)
Joint Language Semantic and Structure Embedding for Knowledge Graph Completion [66.15933600765835]
We propose to jointly embed the semantics in the natural language description of the knowledge triplets with their structure information. Our method embeds knowledge graphs for the completion task via fine-tuning pre-trained language models. Our experiments on a variety of knowledge graph benchmarks have demonstrated the state-of-the-art performance of our method.
arXiv Detail & Related papers (2022-09-19T02:41:02Z)
On Neural Architecture Inductive Biases for Relational Tasks [76.18938462270503]
We introduce a simple architecture based on similarity-distribution scores which we name Compositional Network generalization (CoRelNet) We find that simple architectural choices can outperform existing models in out-of-distribution generalizations.
arXiv Detail & Related papers (2022-06-09T16:24:01Z)
Multi-level Contrastive Learning for Cross-lingual Spoken Language Understanding [90.87454350016121]
We develop novel code-switching schemes to generate hard negative examples for contrastive learning at all levels. We develop a label-aware joint model to leverage label semantics for cross-lingual knowledge transfer.
arXiv Detail & Related papers (2022-05-07T13:44:28Z)
Exposing Cross-Lingual Lexical Knowledge from Multilingual Sentence Encoders [85.80950708769923]
We probe multilingual language models for the amount of cross-lingual lexical knowledge stored in their parameters, and compare them against the original multilingual LMs. We also devise a novel method to expose this knowledge by additionally fine-tuning multilingual models. We report substantial gains on standard benchmarks.
arXiv Detail & Related papers (2022-04-30T13:23:16Z)
LISA: Learning Interpretable Skill Abstractions from Language [85.20587800593293]
We propose a hierarchical imitation learning framework that can learn diverse, interpretable skills from language-conditioned demonstrations. Our method demonstrates a more natural way to condition on language in sequential decision-making problems.
arXiv Detail & Related papers (2022-02-28T19:43:24Z)
Hierarchical Skills for Efficient Exploration [70.62309286348057]
In reinforcement learning, pre-trained low-level skills have the potential to greatly facilitate exploration. Prior knowledge of the downstream task is required to strike the right balance between generality (fine-grained control) and specificity (faster learning) in skill design. We propose a hierarchical skill learning framework that acquires skills of varying complexity in an unsupervised manner.
arXiv Detail & Related papers (2021-10-20T22:29:32Z)
Probing Pretrained Language Models for Lexical Semantics [76.73599166020307]
We present a systematic empirical analysis across six typologically diverse languages and five different lexical tasks. Our results indicate patterns and best practices that hold universally, but also point to prominent variations across languages and tasks.
arXiv Detail & Related papers (2020-10-12T14:24:01Z)
Zero-Shot Cross-Lingual Transfer with Meta Learning [45.29398184889296]
We consider the setting of training models on multiple languages at the same time, when little or no data is available for languages other than English. We show that this challenging setup can be approached using meta-learning. We experiment using standard supervised, zero-shot cross-lingual, as well as few-shot cross-lingual settings for different natural language understanding tasks.
arXiv Detail & Related papers (2020-03-05T16:07:32Z)
Automated Relational Meta-learning [95.02216511235191]
We propose an automated relational meta-learning framework that automatically extracts the cross-task relations and constructs the meta-knowledge graph. We conduct extensive experiments on 2D toy regression and few-shot image classification and the results demonstrate the superiority of ARML over state-of-the-art baselines.
arXiv Detail & Related papers (2020-01-03T07:02:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.