Can current NLI systems handle German word order? Investigating language
model performance on a new German challenge set of minimal pairs
- URL: http://arxiv.org/abs/2306.04523v1
- Date: Wed, 7 Jun 2023 15:33:07 GMT
- Title: Can current NLI systems handle German word order? Investigating language
model performance on a new German challenge set of minimal pairs
- Authors: Ines Reinig and Katja Markert
- Abstract summary: Compared to English, German word order is freer and therefore poses additional challenges for natural language inference (NLI)
We create WOGLI (Word Order in German Language Inference), the first adversarial NLI dataset for German word order.
We show that current German autoencoding models fine-tuned on translated NLI data can struggle on this challenge set.
- Score: 3.0534660670547873
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Compared to English, German word order is freer and therefore poses
additional challenges for natural language inference (NLI). We create WOGLI
(Word Order in German Language Inference), the first adversarial NLI dataset
for German word order that has the following properties: (i) each premise has
an entailed and a non-entailed hypothesis; (ii) premise and hypotheses differ
only in word order and necessary morphological changes to mark case and number.
In particular, each premise andits two hypotheses contain exactly the same
lemmata. Our adversarial examples require the model to use morphological
markers in order to recognise or reject entailment. We show that current German
autoencoding models fine-tuned on translated NLI data can struggle on this
challenge set, reflecting the fact that translated NLI datasets will not mirror
all necessary language phenomena in the target language. We also examine
performance after data augmentation as well as on related word order phenomena
derived from WOGLI. Our datasets are publically available at
https://github.com/ireinig/wogli.
Related papers
- Native Language Identification with Large Language Models [60.80452362519818]
We show that GPT models are proficient at NLI classification, with GPT-4 setting a new performance record of 91.7% on the benchmark11 test set in a zero-shot setting.
We also show that unlike previous fully-supervised settings, LLMs can perform NLI without being limited to a set of known classes.
arXiv Detail & Related papers (2023-12-13T00:52:15Z) - Improving Domain-Specific Retrieval by NLI Fine-Tuning [64.79760042717822]
This article investigates the fine-tuning potential of natural language inference (NLI) data to improve information retrieval and ranking.
We employ both monolingual and multilingual sentence encoders fine-tuned by a supervised method utilizing contrastive loss and NLI data.
Our results point to the fact that NLI fine-tuning increases the performance of the models in both tasks and both languages, with the potential to improve mono- and multilingual models.
arXiv Detail & Related papers (2023-08-06T12:40:58Z) - Jamp: Controlled Japanese Temporal Inference Dataset for Evaluating
Generalization Capacity of Language Models [18.874880342410876]
We present Jamp, a Japanese benchmark focused on temporal inference.
Our dataset includes a range of temporal inference patterns, which enables us to conduct fine-grained analysis.
We evaluate the generalization capacities of monolingual/multilingual LMs by splitting our dataset based on tense fragments.
arXiv Detail & Related papers (2023-06-19T07:00:14Z) - Towards preserving word order importance through Forced Invalidation [80.33036864442182]
We show that pre-trained language models are insensitive to word order.
We propose Forced Invalidation to help preserve the importance of word order.
Our experiments demonstrate that Forced Invalidation significantly improves the sensitivity of the models to word order.
arXiv Detail & Related papers (2023-04-11T13:42:10Z) - Suffix Retrieval-Augmented Language Modeling [1.8710230264817358]
Causal language modeling (LM) uses word history to predict the next word.
BERT, on the other hand, makes use of bi-directional word information in a sentence to predict words at masked positions.
We propose a novel model that simulates a bi-directional contextual effect in an autoregressive manner.
arXiv Detail & Related papers (2022-11-06T07:53:19Z) - Shaking Syntactic Trees on the Sesame Street: Multilingual Probing with
Controllable Perturbations [2.041108289731398]
Recent research has adopted a new experimental field centered around the concept of text perturbations.
Recent research has revealed that shuffled word order has little to no impact on the downstream performance of Transformer-based language models.
arXiv Detail & Related papers (2021-09-28T20:15:29Z) - More Than Words: Collocation Tokenization for Latent Dirichlet
Allocation Models [71.42030830910227]
We propose a new metric for measuring the clustering quality in settings where the models differ.
We show that topics trained with merged tokens result in topic keys that are clearer, more coherent, and more effective at distinguishing topics than those unmerged models.
arXiv Detail & Related papers (2021-08-24T14:08:19Z) - BURT: BERT-inspired Universal Representation from Twin Structure [89.82415322763475]
BURT (BERT inspired Universal Representation from Twin Structure) is capable of generating universal, fixed-size representations for input sequences of any granularity.
Our proposed BURT adopts the Siamese network, learning sentence-level representations from natural language inference dataset and word/phrase-level representations from paraphrasing dataset.
We evaluate BURT across different granularities of text similarity tasks, including STS tasks, SemEval2013 Task 5(a) and some commonly used word similarity tasks.
arXiv Detail & Related papers (2020-04-29T04:01:52Z) - On the Importance of Word Order Information in Cross-lingual Sequence
Labeling [80.65425412067464]
Cross-lingual models that fit into the word order of the source language might fail to handle target languages.
We investigate whether making models insensitive to the word order of the source language can improve the adaptation performance in target languages.
arXiv Detail & Related papers (2020-01-30T03:35:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.