Probing the Category of Verbal Aspect in Transformer Language Models
- URL: http://arxiv.org/abs/2406.02335v1
- Date: Tue, 4 Jun 2024 14:06:03 GMT
- Title: Probing the Category of Verbal Aspect in Transformer Language Models
- Authors: Anisia Katinskaia, Roman Yangarber,
- Abstract summary: We investigate how pretrained language models encode the grammatical category of aspect verbal in Russian.
We perform probing using BERT and RoBERTa on alternative and non-alternative contexts.
Experiments show that BERT and RoBERTa do encode aspect--mostly in their final layers.
- Score: 0.4757470449749875
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We investigate how pretrained language models (PLM) encode the grammatical category of verbal aspect in Russian. Encoding of aspect in transformer LMs has not been studied previously in any language. A particular challenge is posed by "alternative contexts": where either the perfective or the imperfective aspect is suitable grammatically and semantically. We perform probing using BERT and RoBERTa on alternative and non-alternative contexts. First, we assess the models' performance on aspect prediction, via behavioral probing. Next, we examine the models' performance when their contextual representations are substituted with counterfactual representations, via causal probing. These counterfactuals alter the value of the "boundedness" feature--a semantic feature, which characterizes the action in the context. Experiments show that BERT and RoBERTa do encode aspect--mostly in their final layers. The counterfactual interventions affect perfective and imperfective in opposite ways, which is consistent with grammar: perfective is positively affected by adding the meaning of boundedness, and vice versa. The practical implications of our probing results are that fine-tuning only the last layers of BERT on predicting aspect is faster and more effective than fine-tuning the whole model. The model has high predictive uncertainty about aspect in alternative contexts, which tend to lack explicit hints about the boundedness of the described action.
Related papers
- Panoptic Scene Graph Generation with Semantics-Prototype Learning [23.759498629378772]
Panoptic Scene Graph Generation (PSG) parses objects and predicts their relationships (predicate) to connect human language and visual scenes.
Different language preferences of annotators and semantic overlaps between predicates lead to biased predicate annotations.
We propose a novel framework named ADTrans to adaptively transfer biased predicate annotations to informative and unified ones.
arXiv Detail & Related papers (2023-07-28T14:04:06Z) - Towards preserving word order importance through Forced Invalidation [80.33036864442182]
We show that pre-trained language models are insensitive to word order.
We propose Forced Invalidation to help preserve the importance of word order.
Our experiments demonstrate that Forced Invalidation significantly improves the sensitivity of the models to word order.
arXiv Detail & Related papers (2023-04-11T13:42:10Z) - Towards Fine-Grained Information: Identifying the Type and Location of
Translation Errors [80.22825549235556]
Existing approaches can not synchronously consider error position and type.
We build an FG-TED model to predict the textbf addition and textbfomission errors.
Experiments show that our model can identify both error type and position concurrently, and gives state-of-the-art results.
arXiv Detail & Related papers (2023-02-17T16:20:33Z) - Naturalistic Causal Probing for Morpho-Syntax [76.83735391276547]
We suggest a naturalistic strategy for input-level intervention on real world data in Spanish.
Using our approach, we isolate morpho-syntactic features from counfounders in sentences.
We apply this methodology to analyze causal effects of gender and number on contextualized representations extracted from pre-trained models.
arXiv Detail & Related papers (2022-05-14T11:47:58Z) - Unnatural Language Inference [48.45003475966808]
We find that state-of-the-art NLI models, such as RoBERTa and BART, are invariant to, and sometimes even perform better on, examples with randomly reordered words.
Our findings call into question the idea that our natural language understanding models, and the tasks used for measuring their progress, genuinely require a human-like understanding of syntax.
arXiv Detail & Related papers (2020-12-30T20:40:48Z) - On the Sentence Embeddings from Pre-trained Language Models [78.45172445684126]
In this paper, we argue that the semantic information in the BERT embeddings is not fully exploited.
We find that BERT always induces a non-smooth anisotropic semantic space of sentences, which harms its performance of semantic similarity.
We propose to transform the anisotropic sentence embedding distribution to a smooth and isotropic Gaussian distribution through normalizing flows that are learned with an unsupervised objective.
arXiv Detail & Related papers (2020-11-02T13:14:57Z) - On the Interplay Between Fine-tuning and Sentence-level Probing for
Linguistic Knowledge in Pre-trained Transformers [24.858283637038422]
We study three different pre-trained models: BERT, RoBERTa, and ALBERT.
We find that for some probing tasks fine-tuning leads to substantial changes in accuracy.
While fine-tuning indeed changes the representations of a pre-trained model, only in very few cases, fine-tuning has a positive effect on probing accuracy.
arXiv Detail & Related papers (2020-10-06T10:54:00Z) - On the Robustness of Language Encoders against Grammatical Errors [66.05648604987479]
We collect real grammatical errors from non-native speakers and conduct adversarial attacks to simulate these errors on clean text data.
Results confirm that the performance of all tested models is affected but the degree of impact varies.
arXiv Detail & Related papers (2020-05-12T11:01:44Z) - What Happens To BERT Embeddings During Fine-tuning? [19.016185902256826]
We investigate how fine-tuning affects the representations of the BERT model.
We find that fine-tuning primarily affects the top layers of BERT, but with noteworthy variation across tasks.
In particular, dependency parsing reconfigures most of the model, whereas SQuAD and MNLI appear to involve much shallower processing.
arXiv Detail & Related papers (2020-04-29T19:46:26Z) - Adv-BERT: BERT is not robust on misspellings! Generating nature
adversarial samples on BERT [95.88293021131035]
It is unclear, however, how the models will perform in realistic scenarios where textitnatural rather than malicious adversarial instances often exist.
This work systematically explores the robustness of BERT, the state-of-the-art Transformer-style model in NLP, in dealing with noisy data.
arXiv Detail & Related papers (2020-02-27T22:07:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.