Probing for the Usage of Grammatical Number
- URL: http://arxiv.org/abs/2204.08831v4
- Date: Wed, 22 May 2024 15:04:47 GMT
- Title: Probing for the Usage of Grammatical Number
- Authors: Karim Lasri, Tiago Pimentel, Alessandro Lenci, Thierry Poibeau, Ryan Cotterell,
- Abstract summary: We try to find encodings that the model actually uses, introducing a usage-based probing setup.
We focus on how BERT encodes grammatical number, and on how it uses this encoding to solve the number agreement task.
- Score: 103.8175326220026
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A central quest of probing is to uncover how pre-trained models encode a linguistic property within their representations. An encoding, however, might be spurious-i.e., the model might not rely on it when making predictions. In this paper, we try to find encodings that the model actually uses, introducing a usage-based probing setup. We first choose a behavioral task which cannot be solved without using the linguistic property. Then, we attempt to remove the property by intervening on the model's representations. We contend that, if an encoding is used by the model, its removal should harm the performance on the chosen behavioral task. As a case study, we focus on how BERT encodes grammatical number, and on how it uses this encoding to solve the number agreement task. Experimentally, we find that BERT relies on a linear encoding of grammatical number to produce the correct behavioral output. We also find that BERT uses a separate encoding of grammatical number for nouns and verbs. Finally, we identify in which layers information about grammatical number is transferred from a noun to its head verb.
Related papers
- Understanding and Mitigating Tokenization Bias in Language Models [6.418593476658017]
State-of-the-art language models are autoregressive and operate on subword units known as tokens.
We show that popular encoding schemes induce a sampling bias that cannot be mitigated with more training or data.
We propose a novel algorithm to obtain unbiased estimates from any language model trained on tokenized data.
arXiv Detail & Related papers (2024-06-24T17:38:02Z) - T2S-GPT: Dynamic Vector Quantization for Autoregressive Sign Language Production from Text [59.57676466961787]
We propose a novel dynamic vector quantization (DVA-VAE) model that can adjust the encoding length based on the information density in sign language.
Experiments conducted on the PHOENIX14T dataset demonstrate the effectiveness of our proposed method.
We propose a new large German sign language dataset, PHOENIX-News, which contains 486 hours of sign language videos, audio, and transcription texts.
arXiv Detail & Related papers (2024-06-11T10:06:53Z) - Suffix Retrieval-Augmented Language Modeling [1.8710230264817358]
Causal language modeling (LM) uses word history to predict the next word.
BERT, on the other hand, makes use of bi-directional word information in a sentence to predict words at masked positions.
We propose a novel model that simulates a bi-directional contextual effect in an autoregressive manner.
arXiv Detail & Related papers (2022-11-06T07:53:19Z) - Probing for targeted syntactic knowledge through grammatical error
detection [13.653209309144593]
We propose grammatical error detection as a diagnostic probe to evaluate pre-trained English language models.
We leverage public annotated training data from both English second language learners and Wikipedia edits.
We find that masked language models linearly encode information relevant to the detection of SVA errors, while the autoregressive models perform on par with our baseline.
arXiv Detail & Related papers (2022-10-28T16:01:25Z) - Thutmose Tagger: Single-pass neural model for Inverse Text Normalization [76.87664008338317]
Inverse text normalization (ITN) is an essential post-processing step in automatic speech recognition.
We present a dataset preparation method based on the granular alignment of ITN examples.
One-to-one correspondence between tags and input words improves the interpretability of the model's predictions.
arXiv Detail & Related papers (2022-07-29T20:39:02Z) - Counterfactual Interventions Reveal the Causal Effect of Relative Clause
Representations on Agreement Prediction [61.4913233397155]
We show that BERT uses information about RC spans during agreement prediction using the linguistically strategy.
We also found that counterfactual representations generated for a specific RC subtype influenced the number prediction in sentences with other RC subtypes, suggesting that information about RC boundaries was encoded abstractly in BERT's representation.
arXiv Detail & Related papers (2021-05-14T17:11:55Z) - AMBERT: A Pre-trained Language Model with Multi-Grained Tokenization [13.082435183692393]
We propose a novel pre-trained language model, referred to as AMBERT (A Multi-grained BERT)
For English, AMBERT takes both the sequence of words (fine-grained tokens) and the sequence of phrases (coarse-grained tokens) as input after tokenization.
Experiments have been conducted on benchmark datasets for Chinese and English, including CLUE, GLUE, SQuAD and RACE.
arXiv Detail & Related papers (2020-08-27T00:23:48Z) - On the Robustness of Language Encoders against Grammatical Errors [66.05648604987479]
We collect real grammatical errors from non-native speakers and conduct adversarial attacks to simulate these errors on clean text data.
Results confirm that the performance of all tested models is affected but the degree of impact varies.
arXiv Detail & Related papers (2020-05-12T11:01:44Z) - Adversarial Transfer Learning for Punctuation Restoration [58.2201356693101]
Adversarial multi-task learning is introduced to learn task invariant knowledge for punctuation prediction.
Experiments are conducted on IWSLT2011 datasets.
arXiv Detail & Related papers (2020-04-01T06:19:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.