Exploring the Role of BERT Token Representations to Explain Sentence
Probing Results
- URL: http://arxiv.org/abs/2104.01477v1
- Date: Sat, 3 Apr 2021 20:40:42 GMT
- Title: Exploring the Role of BERT Token Representations to Explain Sentence
Probing Results
- Authors: Hosein Mohebbi, Ali Modarressi, Mohammad Taher Pilehvar
- Abstract summary: We show that BERT tends to encode meaningful knowledge in specific token representations.
This allows the model to detect syntactic and semantic abnormalities and to distinctively separate grammatical number and tense subspaces.
- Score: 15.652077779677091
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Several studies have been carried out on revealing linguistic features
captured by BERT. This is usually achieved by training a diagnostic classifier
on the representations obtained from different layers of BERT. The subsequent
classification accuracy is then interpreted as the ability of the model in
encoding the corresponding linguistic property. Despite providing insights,
these studies have left out the potential role of token representations. In
this paper, we provide an analysis on the representation space of BERT in
search for distinct and meaningful subspaces that can explain probing results.
Based on a set of probing tasks and with the help of attribution methods we
show that BERT tends to encode meaningful knowledge in specific token
representations (which are often ignored in standard classification setups),
allowing the model to detect syntactic and semantic abnormalities, and to
distinctively separate grammatical number and tense subspaces.
Related papers
- A Study on How Attention Scores in the BERT Model are Aware of Lexical Categories in Syntactic and Semantic Tasks on the GLUE Benchmark [0.0]
This study examines whether the attention scores between tokens in the BERT model significantly vary based on lexical categories during the fine-tuning process for downstream tasks.
Our hypothesis posits that in downstream tasks that prioritize semantic information, attention scores centered on content words are enhanced, while in cases emphasizing syntactic information, attention scores centered on function words are intensified.
arXiv Detail & Related papers (2024-03-25T06:18:18Z) - What does BERT learn about prosody? [1.1548853370822343]
We study whether prosody is part of the structural information of the language that models learn.
Our results show that information about prosodic prominence spans across many layers but is mostly focused in middle layers suggesting that BERT relies mostly on syntactic and semantic information.
arXiv Detail & Related papers (2023-04-25T10:34:56Z) - Word-order typology in Multilingual BERT: A case study in
subordinate-clause detection [1.2129015549576372]
In this paper, we use the task of subordinate-clause detection within and across languages to probe these properties.
We show that this task is deceptively simple, with easy gains offset by a long tail of harder cases, and that BERT's zero-shot performance is dominated by word-order effects.
arXiv Detail & Related papers (2022-05-24T11:35:39Z) - Multilingual Extraction and Categorization of Lexical Collocations with
Graph-aware Transformers [86.64972552583941]
We put forward a sequence tagging BERT-based model enhanced with a graph-aware transformer architecture, which we evaluate on the task of collocation recognition in context.
Our results suggest that explicitly encoding syntactic dependencies in the model architecture is helpful, and provide insights on differences in collocation typification in English, Spanish and French.
arXiv Detail & Related papers (2022-05-23T16:47:37Z) - A Latent-Variable Model for Intrinsic Probing [93.62808331764072]
We propose a novel latent-variable formulation for constructing intrinsic probes.
We find empirical evidence that pre-trained representations develop a cross-lingually entangled notion of morphosyntax.
arXiv Detail & Related papers (2022-01-20T15:01:12Z) - Intrinsic Probing through Dimension Selection [69.52439198455438]
Most modern NLP systems make use of pre-trained contextual representations that attain astonishingly high performance on a variety of tasks.
Such high performance should not be possible unless some form of linguistic structure inheres in these representations, and a wealth of research has sprung up on probing for it.
In this paper, we draw a distinction between intrinsic probing, which examines how linguistic information is structured within a representation, and the extrinsic probing popular in prior work, which only argues for the presence of such information by showing that it can be successfully extracted.
arXiv Detail & Related papers (2020-10-06T15:21:08Z) - A Comparative Study on Structural and Semantic Properties of Sentence
Embeddings [77.34726150561087]
We propose a set of experiments using a widely-used large-scale data set for relation extraction.
We show that different embedding spaces have different degrees of strength for the structural and semantic properties.
These results provide useful information for developing embedding-based relation extraction methods.
arXiv Detail & Related papers (2020-09-23T15:45:32Z) - Syntactic Structure Distillation Pretraining For Bidirectional Encoders [49.483357228441434]
We introduce a knowledge distillation strategy for injecting syntactic biases into BERT pretraining.
We distill the approximate marginal distribution over words in context from the syntactic LM.
Our findings demonstrate the benefits of syntactic biases, even in representation learners that exploit large amounts of data.
arXiv Detail & Related papers (2020-05-27T16:44:01Z) - Learning Interpretable and Discrete Representations with Adversarial
Training for Unsupervised Text Classification [87.28408260725138]
TIGAN learns to encode texts into two disentangled representations, including a discrete code and a continuous noise.
The extracted topical words for representing latent topics show that TIGAN learns coherent and highly interpretable topics.
arXiv Detail & Related papers (2020-04-28T02:53:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.