Related papers: Tracking linguistic information in transformer-based sentence embeddings through targeted sparsification

Tracking linguistic information in transformer-based sentence embeddings through targeted sparsification

URL: http://arxiv.org/abs/2407.18119v1
Date: Thu, 25 Jul 2024 15:27:08 GMT
Title: Tracking linguistic information in transformer-based sentence embeddings through targeted sparsification
Authors: Vivi Nastase, Paola Merlo,
Abstract summary: Analyses of transformer-based models have shown that they encode a variety of linguistic information from their textual input. We test to what degree information about chunks (in particular noun, verb or prepositional phrases) can be localized in sentence embeddings. Our results show that such information is not distributed over the entire sentence embedding, but rather it is encoded in specific regions.
Score: 1.6021932740447968
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Analyses of transformer-based models have shown that they encode a variety of linguistic information from their textual input. While these analyses have shed a light on the relation between linguistic information on one side, and internal architecture and parameters on the other, a question remains unanswered: how is this linguistic information reflected in sentence embeddings? Using datasets consisting of sentences with known structure, we test to what degree information about chunks (in particular noun, verb or prepositional phrases), such as grammatical number, or semantic role, can be localized in sentence embeddings. Our results show that such information is not distributed over the entire sentence embedding, but rather it is encoded in specific regions. Understanding how the information from an input text is compressed into sentence embeddings helps understand current transformer models and help build future explainable neural models.

Related papers

Beyond the Rosetta Stone: Unification Forces in Generalization Dynamics [56.145578792496714]
Large language models (LLMs) struggle with cross-lingual knowledge transfer.<n>We study the causes and dynamics of this phenomenon by training small Transformer models from scratch on synthetic multilingual datasets.
arXiv Detail & Related papers (2025-08-14T18:44:13Z)
Are there identifiable structural parts in the sentence embedding whole? [1.6021932740447968]
Sentence embeddings from transformer models encode in a fixed length vector much linguistic information. We explore the hypothesis that these embeddings consist of overlapping layers of information that can be separated. We show that this is the case using a dataset consisting of sentences with known chunk structure, and two linguistic intelligence datasets.
arXiv Detail & Related papers (2024-06-24T11:58:33Z)
Explaining Text Similarity in Transformer Models [52.571158418102584]
Recent advances in explainable AI have made it possible to mitigate limitations by leveraging improved explanations for Transformers. We use BiLRP, an extension developed for computing second-order explanations in bilinear similarity models, to investigate which feature interactions drive similarity in NLP models. Our findings contribute to a deeper understanding of different semantic similarity tasks and models, highlighting how novel explainable AI methods enable in-depth analyses and corpus-level insights.
arXiv Detail & Related papers (2024-05-10T17:11:31Z)
Learning Phonotactics from Linguistic Informants [54.086544221761486]
Our model iteratively selects or synthesizes a data-point according to one of a range of information-theoretic policies. We find that the information-theoretic policies that our model uses to select items to query the informant achieve sample efficiency comparable to, or greater than, fully supervised approaches.
arXiv Detail & Related papers (2024-05-08T00:18:56Z)
Putting Context in Context: the Impact of Discussion Structure on Text Classification [13.15873889847739]
We propose a series of experiments on a large dataset for stance detection in English. We evaluate the contribution of different types of contextual information. We show that structural information can be highly beneficial to text classification but only under certain circumstances.
arXiv Detail & Related papers (2024-02-05T12:56:22Z)
Disentangling continuous and discrete linguistic signals in transformer-based sentence embeddings [1.8927791081850118]
We explore whether we can compress transformer-based sentence embeddings into a representation that separates different linguistic signals. We show that by compressing an input sequence that shares a targeted phenomenon into the latent layer of a variational autoencoder-like system, the targeted linguistic information becomes more explicit.
arXiv Detail & Related papers (2023-12-18T15:16:54Z)
X-PARADE: Cross-Lingual Textual Entailment and Information Divergence across Paragraphs [55.80189506270598]
X-PARADE is the first cross-lingual dataset of paragraph-level information divergences. Annotators label a paragraph in a target language at the span level and evaluate it with respect to a corresponding paragraph in a source language. Aligned paragraphs are sourced from Wikipedia pages in different languages.
arXiv Detail & Related papers (2023-09-16T04:34:55Z)
How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding [56.222097640468306]
We provide mechanistic understanding of how transformers learn "semantic structure" We show, through a combination of mathematical analysis and experiments on Wikipedia data, that the embedding layer and the self-attention layer encode the topical structure.
arXiv Detail & Related papers (2023-03-07T21:42:17Z)
Syntax-guided Localized Self-attention by Constituency Syntactic Distance [26.141356981833862]
We propose a syntax-guided localized self-attention for Transformer. It allows incorporating directly grammar structures from an external constituency. Experimental results show that our model could consistently improve translation performance.
arXiv Detail & Related papers (2022-10-21T06:37:25Z)
What Context Features Can Transformer Language Models Use? [32.49689188570872]
We measure usable information by selectively ablating lexical and structural information in transformer language models trained on English Wikipedia. In both mid- and long-range contexts, we find that several extremely destructive context manipulations remove less than 15% of the usable information.
arXiv Detail & Related papers (2021-06-15T18:38:57Z)
Unsupervised Distillation of Syntactic Information from Contextualized Word Representations [62.230491683411536]
We tackle the task of unsupervised disentanglement between semantics and structure in neural language representations. To this end, we automatically generate groups of sentences which are structurally similar but semantically different. We demonstrate that our transformation clusters vectors in space by structural properties, rather than by lexical semantics.
arXiv Detail & Related papers (2020-10-11T15:13:18Z)
A Comparative Study on Structural and Semantic Properties of Sentence Embeddings [77.34726150561087]
We propose a set of experiments using a widely-used large-scale data set for relation extraction. We show that different embedding spaces have different degrees of strength for the structural and semantic properties. These results provide useful information for developing embedding-based relation extraction methods.
arXiv Detail & Related papers (2020-09-23T15:45:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.