Language models are not naysayers: An analysis of language models on
negation benchmarks
- URL: http://arxiv.org/abs/2306.08189v1
- Date: Wed, 14 Jun 2023 01:16:37 GMT
- Title: Language models are not naysayers: An analysis of language models on
negation benchmarks
- Authors: Thinh Hung Truong, Timothy Baldwin, Karin Verspoor, Trevor Cohn
- Abstract summary: We evaluate the ability of current-generation auto-regressive language models to handle negation.
We show that LLMs have several limitations including insensitivity to the presence of negation, an inability to capture the lexical semantics of negation, and a failure to reason under negation.
- Score: 58.32362243122714
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Negation has been shown to be a major bottleneck for masked language models,
such as BERT. However, whether this finding still holds for larger-sized
auto-regressive language models (``LLMs'') has not been studied
comprehensively. With the ever-increasing volume of research and applications
of LLMs, we take a step back to evaluate the ability of current-generation LLMs
to handle negation, a fundamental linguistic phenomenon that is central to
language understanding. We evaluate different LLMs -- including the open-source
GPT-neo, GPT-3, and InstructGPT -- against a wide range of negation benchmarks.
Through systematic experimentation with varying model sizes and prompts, we
show that LLMs have several limitations including insensitivity to the presence
of negation, an inability to capture the lexical semantics of negation, and a
failure to reason under negation.
Related papers
- Understanding and Mitigating Language Confusion in LLMs [76.96033035093204]
We evaluate 15 typologically diverse languages with existing and newly-created English and multilingual prompts.
We find that Llama Instruct and Mistral models exhibit high degrees of language confusion.
We find that language confusion can be partially mitigated via few-shot prompting, multilingual SFT and preference tuning.
arXiv Detail & Related papers (2024-06-28T17:03:51Z) - Revisiting subword tokenization: A case study on affixal negation in large language models [57.75279238091522]
We measure the impact of affixal negation on modern English large language models (LLMs)
We conduct experiments using LLMs with different subword tokenization methods.
We show that models can, on the whole, reliably recognize the meaning of affixal negation.
arXiv Detail & Related papers (2024-04-03T03:14:27Z) - NoMIRACL: Knowing When You Don't Know for Robust Multilingual
Retrieval-Augmented Generation [92.5132418788568]
Retrieval-augmented generation (RAG) grounds large language model (LLM) output by leveraging external knowledge sources to reduce factual hallucinations.
NoMIRACL is a human-annotated dataset for evaluating LLM robustness in RAG across 18 typologically diverse languages.
We measure robustness using two metrics: (i) hallucination rate, measuring model tendency to hallucinate an answer, when the answer is not present in passages in the non-relevant subset, and (ii) error rate, measuring model inaccuracy to recognize relevant passages in the relevant subset.
arXiv Detail & Related papers (2023-12-18T17:18:04Z) - This is not a Dataset: A Large Negation Benchmark to Challenge Large
Language Models [4.017326849033009]
We try to clarify the reasons for the sub-optimal performance of large language models understanding negation.
We introduce a large semi-automatically generated dataset of circa 400,000 descriptive sentences about commonsense knowledge.
We have used our dataset with the largest available open LLMs in a zero-shot approach to grasp their generalization and inference capability.
arXiv Detail & Related papers (2023-10-24T15:38:21Z) - Not another Negation Benchmark: The NaN-NLI Test Suite for Sub-clausal
Negation [59.307534363825816]
Negation is poorly captured by current language models, although the extent of this problem is not widely understood.
We introduce a natural language inference (NLI) test suite to enable probing the capabilities of NLP methods.
arXiv Detail & Related papers (2022-10-06T23:39:01Z) - Improving negation detection with negation-focused pre-training [58.32362243122714]
Negation is a common linguistic feature that is crucial in many language understanding tasks.
Recent work has shown that state-of-the-art NLP models underperform on samples containing negation.
We propose a new negation-focused pre-training strategy, involving targeted data augmentation and negation masking.
arXiv Detail & Related papers (2022-05-09T02:41:11Z) - Generalized Quantifiers as a Source of Error in Multilingual NLU
Benchmarks [5.818232893255398]
We rely on Generalized Quantifier Theory for language-independent representations of the semantics of quantifier words.
We find that quantifiers are pervasive in NLU benchmarks, and their occurrence at test time is associated with performance drops.
Multilingual models also exhibit unsatisfying quantifier reasoning abilities, but not necessarily worse for non-English languages.
arXiv Detail & Related papers (2022-04-22T10:21:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.