Shaking Syntactic Trees on the Sesame Street: Multilingual Probing with
Controllable Perturbations
- URL: http://arxiv.org/abs/2109.14017v1
- Date: Tue, 28 Sep 2021 20:15:29 GMT
- Title: Shaking Syntactic Trees on the Sesame Street: Multilingual Probing with
Controllable Perturbations
- Authors: Ekaterina Taktasheva and Vladislav Mikhailov and Ekaterina Artemova
- Abstract summary: Recent research has adopted a new experimental field centered around the concept of text perturbations.
Recent research has revealed that shuffled word order has little to no impact on the downstream performance of Transformer-based language models.
- Score: 2.041108289731398
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent research has adopted a new experimental field centered around the
concept of text perturbations which has revealed that shuffled word order has
little to no impact on the downstream performance of Transformer-based language
models across many NLP tasks. These findings contradict the common
understanding of how the models encode hierarchical and structural information
and even question if the word order is modeled with position embeddings. To
this end, this paper proposes nine probing datasets organized by the type of
\emph{controllable} text perturbation for three Indo-European languages with a
varying degree of word order flexibility: English, Swedish and Russian. Based
on the probing analysis of the M-BERT and M-BART models, we report that the
syntactic sensitivity depends on the language and model pre-training
objectives. We also find that the sensitivity grows across layers together with
the increase of the perturbation granularity. Last but not least, we show that
the models barely use the positional information to induce syntactic trees from
their intermediate self-attention and contextualized representations.
Related papers
- Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds [59.71218039095155]
We evaluate language understanding capacities on simple inference tasks that most humans find trivial.
We target (i) grammatically-specified entailments, (ii) premises with evidential adverbs of uncertainty, and (iii) monotonicity entailments.
The models exhibit moderate to low performance on these evaluation sets.
arXiv Detail & Related papers (2023-05-24T06:41:09Z) - SUN: Exploring Intrinsic Uncertainties in Text-to-SQL Parsers [61.48159785138462]
This paper aims to improve the performance of text-to-dependence by exploring the intrinsic uncertainties in the neural network based approaches (called SUN)
Extensive experiments on five benchmark datasets demonstrate that our method significantly outperforms competitors and achieves new state-of-the-art results.
arXiv Detail & Related papers (2022-09-14T06:27:51Z) - A Knowledge-Enhanced Adversarial Model for Cross-lingual Structured
Sentiment Analysis [31.05169054736711]
Cross-lingual structured sentiment analysis task aims to transfer the knowledge from source language to target one.
We propose a Knowledge-Enhanced Adversarial Model (textttKEAM) with both implicit distributed and explicit structural knowledge.
We conduct experiments on five datasets and compare textttKEAM with both the supervised and unsupervised methods.
arXiv Detail & Related papers (2022-05-31T03:07:51Z) - Demystifying Neural Language Models' Insensitivity to Word-Order [7.72780997900827]
We investigate the insensitivity of natural language models to word-order by quantifying perturbations.
We find that neural language models require local ordering more so than the global ordering of tokens.
arXiv Detail & Related papers (2021-07-29T13:34:20Z) - Comparative Error Analysis in Neural and Finite-state Models for
Unsupervised Character-level Transduction [34.1177259741046]
We compare the two model classes side by side and find that they tend to make different types of errors even when achieving comparable performance.
We investigate how combining finite-state and sequence-to-sequence models at decoding time affects the output quantitatively and qualitatively.
arXiv Detail & Related papers (2021-06-24T00:09:24Z) - Lattice-BERT: Leveraging Multi-Granularity Representations in Chinese
Pre-trained Language Models [62.41139712595334]
We propose a novel pre-training paradigm for Chinese -- Lattice-BERT.
We construct a lattice graph from the characters and words in a sentence and feed all these text units into transformers.
We show that our model can bring an average increase of 1.5% under the 12-layer setting.
arXiv Detail & Related papers (2021-04-15T02:36:49Z) - A Closer Look at Linguistic Knowledge in Masked Language Models: The
Case of Relative Clauses in American English [17.993417004424078]
Transformer-based language models achieve high performance on various tasks, but we still lack understanding of the kind of linguistic knowledge they learn and rely on.
We evaluate three models (BERT, RoBERTa, and ALBERT) testing their grammatical and semantic knowledge by sentence-level probing, diagnostic cases, and masked prediction tasks.
arXiv Detail & Related papers (2020-11-02T13:25:39Z) - Exemplar-Controllable Paraphrasing and Translation using Bitext [57.92051459102902]
We adapt models from prior work to be able to learn solely from bilingual text (bitext)
Our single proposed model can perform four tasks: controlled paraphrase generation in both languages and controlled machine translation in both language directions.
arXiv Detail & Related papers (2020-10-12T17:02:50Z) - InfoBERT: Improving Robustness of Language Models from An Information
Theoretic Perspective [84.78604733927887]
Large-scale language models such as BERT have achieved state-of-the-art performance across a wide range of NLP tasks.
Recent studies show that such BERT-based models are vulnerable facing the threats of textual adversarial attacks.
We propose InfoBERT, a novel learning framework for robust fine-tuning of pre-trained language models.
arXiv Detail & Related papers (2020-10-05T20:49:26Z) - Exploiting Syntactic Structure for Better Language Modeling: A Syntactic
Distance Approach [78.77265671634454]
We make use of a multi-task objective, i.e., the models simultaneously predict words as well as ground truth parse trees in a form called "syntactic distances"
Experimental results on the Penn Treebank and Chinese Treebank datasets show that when ground truth parse trees are provided as additional training signals, the model is able to achieve lower perplexity and induce trees with better quality.
arXiv Detail & Related papers (2020-05-12T15:35:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.