This is not a Dataset: A Large Negation Benchmark to Challenge Large
Language Models
- URL: http://arxiv.org/abs/2310.15941v1
- Date: Tue, 24 Oct 2023 15:38:21 GMT
- Title: This is not a Dataset: A Large Negation Benchmark to Challenge Large
Language Models
- Authors: Iker Garc\'ia-Ferrero, Bego\~na Altuna, Javier \'Alvez, Itziar
Gonzalez-Dios, German Rigau
- Abstract summary: We try to clarify the reasons for the sub-optimal performance of large language models understanding negation.
We introduce a large semi-automatically generated dataset of circa 400,000 descriptive sentences about commonsense knowledge.
We have used our dataset with the largest available open LLMs in a zero-shot approach to grasp their generalization and inference capability.
- Score: 4.017326849033009
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Although large language models (LLMs) have apparently acquired a certain
level of grammatical knowledge and the ability to make generalizations, they
fail to interpret negation, a crucial step in Natural Language Processing. We
try to clarify the reasons for the sub-optimal performance of LLMs
understanding negation. We introduce a large semi-automatically generated
dataset of circa 400,000 descriptive sentences about commonsense knowledge that
can be true or false in which negation is present in about 2/3 of the corpus in
different forms. We have used our dataset with the largest available open LLMs
in a zero-shot approach to grasp their generalization and inference capability
and we have also fine-tuned some of the models to assess whether the
understanding of negation can be trained. Our findings show that, while LLMs
are proficient at classifying affirmative sentences, they struggle with
negative sentences and lack a deep understanding of negation, often relying on
superficial cues. Although fine-tuning the models on negative sentences
improves their performance, the lack of generalization in handling negation is
persistent, highlighting the ongoing challenges of LLMs regarding negation
understanding and generalization. The dataset and code are publicly available.
Related papers
- LLMs' Reading Comprehension Is Affected by Parametric Knowledge and Struggles with Hypothetical Statements [59.71218039095155]
Task of reading comprehension (RC) provides a primary means to assess language models' natural language understanding (NLU) capabilities.
If the context aligns with the models' internal knowledge, it is hard to discern whether the models' answers stem from context comprehension or from internal information.
To address this issue, we suggest to use RC on imaginary data, based on fictitious facts and entities.
arXiv Detail & Related papers (2024-04-09T13:08:56Z) - Revisiting subword tokenization: A case study on affixal negation in large language models [57.75279238091522]
We measure the impact of affixal negation on modern English large language models (LLMs)
We conduct experiments using LLMs with different subword tokenization methods.
We show that models can, on the whole, reliably recognize the meaning of affixal negation.
arXiv Detail & Related papers (2024-04-03T03:14:27Z) - Language models are not naysayers: An analysis of language models on
negation benchmarks [58.32362243122714]
We evaluate the ability of current-generation auto-regressive language models to handle negation.
We show that LLMs have several limitations including insensitivity to the presence of negation, an inability to capture the lexical semantics of negation, and a failure to reason under negation.
arXiv Detail & Related papers (2023-06-14T01:16:37Z) - Can large language models generate salient negative statements? [18.577880767789097]
We examine the ability of large language models to generate salient (interesting) negative statements about real-world entities.
We probe the LLMs using zero- and k-shot unconstrained probes, and compare with traditional methods for negation generation.
We measure the correctness and salience of the generated lists about subjects from different domains.
arXiv Detail & Related papers (2023-05-26T09:13:59Z) - Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds [59.71218039095155]
We evaluate language understanding capacities on simple inference tasks that most humans find trivial.
We target (i) grammatically-specified entailments, (ii) premises with evidential adverbs of uncertainty, and (iii) monotonicity entailments.
The models exhibit moderate to low performance on these evaluation sets.
arXiv Detail & Related papers (2023-05-24T06:41:09Z) - CONDAQA: A Contrastive Reading Comprehension Dataset for Reasoning about
Negation [21.56001677478673]
We present the first English reading comprehension dataset which requires reasoning about the implications of negated statements in paragraphs.
CONDAQA features 14,182 question-answer pairs with over 200 unique negation cues.
The best performing model on CONDAQA (UnifiedQA-v2-3b) achieves only 42% on our consistency metric, well below human performance which is 81%.
arXiv Detail & Related papers (2022-11-01T06:10:26Z) - Improving negation detection with negation-focused pre-training [58.32362243122714]
Negation is a common linguistic feature that is crucial in many language understanding tasks.
Recent work has shown that state-of-the-art NLP models underperform on samples containing negation.
We propose a new negation-focused pre-training strategy, involving targeted data augmentation and negation masking.
arXiv Detail & Related papers (2022-05-09T02:41:11Z) - Understanding by Understanding Not: Modeling Negation in Language Models [81.21351681735973]
Negation is a core construction in natural language.
We propose to augment the language modeling objective with an unlikelihood objective that is based on negated generic sentences.
We reduce the mean top1 error rate to 4% on the negated LAMA dataset.
arXiv Detail & Related papers (2021-05-07T21:58:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.