Generalized Quantifiers as a Source of Error in Multilingual NLU
Benchmarks
- URL: http://arxiv.org/abs/2204.10615v1
- Date: Fri, 22 Apr 2022 10:21:46 GMT
- Title: Generalized Quantifiers as a Source of Error in Multilingual NLU
Benchmarks
- Authors: Ruixiang Cui, Daniel Hershcovich, Anders S{\o}gaard
- Abstract summary: We rely on Generalized Quantifier Theory for language-independent representations of the semantics of quantifier words.
We find that quantifiers are pervasive in NLU benchmarks, and their occurrence at test time is associated with performance drops.
Multilingual models also exhibit unsatisfying quantifier reasoning abilities, but not necessarily worse for non-English languages.
- Score: 5.818232893255398
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Logical approaches to representing language have developed and evaluated
computational models of quantifier words since the 19th century, but today's
NLU models still struggle to capture their semantics. We rely on Generalized
Quantifier Theory for language-independent representations of the semantics of
quantifier words, to quantify their contribution to the errors of NLU models.
We find that quantifiers are pervasive in NLU benchmarks, and their occurrence
at test time is associated with performance drops. Multilingual models also
exhibit unsatisfying quantifier reasoning abilities, but not necessarily worse
for non-English languages. To facilitate directly-targeted probing, we present
an adversarial generalized quantifier NLI task (GQNLI) and show that
pre-trained language models have a clear lack of robustness in generalized
quantifier reasoning.
Related papers
- Revisiting subword tokenization: A case study on affixal negation in large language models [57.75279238091522]
We measure the impact of affixal negation on modern English large language models (LLMs)
We conduct experiments using LLMs with different subword tokenization methods.
We show that models can, on the whole, reliably recognize the meaning of affixal negation.
arXiv Detail & Related papers (2024-04-03T03:14:27Z) - This is not a Dataset: A Large Negation Benchmark to Challenge Large
Language Models [4.017326849033009]
We try to clarify the reasons for the sub-optimal performance of large language models understanding negation.
We introduce a large semi-automatically generated dataset of circa 400,000 descriptive sentences about commonsense knowledge.
We have used our dataset with the largest available open LLMs in a zero-shot approach to grasp their generalization and inference capability.
arXiv Detail & Related papers (2023-10-24T15:38:21Z) - Language models are not naysayers: An analysis of language models on
negation benchmarks [58.32362243122714]
We evaluate the ability of current-generation auto-regressive language models to handle negation.
We show that LLMs have several limitations including insensitivity to the presence of negation, an inability to capture the lexical semantics of negation, and a failure to reason under negation.
arXiv Detail & Related papers (2023-06-14T01:16:37Z) - Evaluating statistical language models as pragmatic reasoners [39.72348730045737]
We evaluate the capacity of large language models to infer meanings of pragmatic utterances.
We find that LLMs can derive context-grounded, human-like distributions over the interpretations of several complex pragmatic utterances.
Results inform the inferential capacity of statistical language models, and their use in pragmatic and semantic parsing applications.
arXiv Detail & Related papers (2023-05-01T18:22:10Z) - Towards preserving word order importance through Forced Invalidation [80.33036864442182]
We show that pre-trained language models are insensitive to word order.
We propose Forced Invalidation to help preserve the importance of word order.
Our experiments demonstrate that Forced Invalidation significantly improves the sensitivity of the models to word order.
arXiv Detail & Related papers (2023-04-11T13:42:10Z) - Not another Negation Benchmark: The NaN-NLI Test Suite for Sub-clausal
Negation [59.307534363825816]
Negation is poorly captured by current language models, although the extent of this problem is not widely understood.
We introduce a natural language inference (NLI) test suite to enable probing the capabilities of NLP methods.
arXiv Detail & Related papers (2022-10-06T23:39:01Z) - Language Models are Few-shot Multilingual Learners [66.11011385895195]
We evaluate the multilingual skills of the GPT and T5 models in conducting multi-class classification on non-English languages.
We show that, given a few English examples as context, pre-trained language models can predict not only English test samples but also non-English ones.
arXiv Detail & Related papers (2021-09-16T03:08:22Z) - Understanding by Understanding Not: Modeling Negation in Language Models [81.21351681735973]
Negation is a core construction in natural language.
We propose to augment the language modeling objective with an unlikelihood objective that is based on negated generic sentences.
We reduce the mean top1 error rate to 4% on the negated LAMA dataset.
arXiv Detail & Related papers (2021-05-07T21:58:35Z) - Recurrent Neural Network Language Models Always Learn English-Like
Relative Clause Attachment [17.995905582226463]
We compare model performance in English and Spanish to show that non-linguistic biases in RNN LMs advantageously overlap with syntactic structure in English but not Spanish.
English models may appear to acquire human-like syntactic preferences, while models trained on Spanish fail to acquire comparable human-like preferences.
arXiv Detail & Related papers (2020-05-01T01:21:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.