Strong hallucinations from negation and how to fix them
- URL: http://arxiv.org/abs/2402.10543v1
- Date: Fri, 16 Feb 2024 10:11:20 GMT
- Title: Strong hallucinations from negation and how to fix them
- Authors: Nicholas Asher and Swarnadeep Bhar
- Abstract summary: We show that our approach improves model performance in cloze prompting and natural language inference tasks with negation without requiring training on sparse negative data.
We call such responses textitstrong hallucinations and prove that they follow from an LM's computation of its internal representations for logical operators and outputs from those representations.
- Score: 2.50194939587674
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite great performance on many tasks, language models (LMs) still struggle
with reasoning, sometimes providing responses that cannot possibly be true
because they stem from logical incoherence. We call such responses
\textit{strong hallucinations} and prove that they follow from an LM's
computation of its internal representations for logical operators and outputs
from those representations. Focusing on negation, we provide a novel solution
in which negation is treated not as another element of a latent representation,
but as \textit{an operation over an LM's latent representations that constrains
how they may evolve}. We show that our approach improves model performance in
cloze prompting and natural language inference tasks with negation without
requiring training on sparse negative data.
Related papers
- Generating Diverse Negations from Affirmative Sentences [0.999726509256195]
Negations are important in real-world applications as they encode negative polarity in verb phrases, clauses, or other expressions.
We propose NegVerse, a method that tackles the lack of negation datasets by producing a diverse range of negation types.
We provide new rules for masking parts of sentences where negations are most likely to occur, based on syntactic structure.
We also propose a filtering mechanism to identify negation cues and remove degenerate examples, producing a diverse range of meaningful perturbations.
arXiv Detail & Related papers (2024-10-30T21:25:02Z) - Revisiting subword tokenization: A case study on affixal negation in large language models [57.75279238091522]
We measure the impact of affixal negation on modern English large language models (LLMs)
We conduct experiments using LLMs with different subword tokenization methods.
We show that models can, on the whole, reliably recognize the meaning of affixal negation.
arXiv Detail & Related papers (2024-04-03T03:14:27Z) - Language Models can be Logical Solvers [99.40649402395725]
We introduce LoGiPT, a novel language model that directly emulates the reasoning processes of logical solvers.
LoGiPT is fine-tuned on a newly constructed instruction-tuning dataset derived from revealing and refining the invisible reasoning process of deductive solvers.
arXiv Detail & Related papers (2023-11-10T16:23:50Z) - This is not a Dataset: A Large Negation Benchmark to Challenge Large
Language Models [4.017326849033009]
We try to clarify the reasons for the sub-optimal performance of large language models understanding negation.
We introduce a large semi-automatically generated dataset of circa 400,000 descriptive sentences about commonsense knowledge.
We have used our dataset with the largest available open LLMs in a zero-shot approach to grasp their generalization and inference capability.
arXiv Detail & Related papers (2023-10-24T15:38:21Z) - Language models are not naysayers: An analysis of language models on
negation benchmarks [58.32362243122714]
We evaluate the ability of current-generation auto-regressive language models to handle negation.
We show that LLMs have several limitations including insensitivity to the presence of negation, an inability to capture the lexical semantics of negation, and a failure to reason under negation.
arXiv Detail & Related papers (2023-06-14T01:16:37Z) - Discovering Latent Knowledge in Language Models Without Supervision [72.95136739040676]
Existing techniques for training language models can be misaligned with the truth.
We propose directly finding latent knowledge inside the internal activations of a language model in a purely unsupervised way.
We show that despite using no supervision and no model outputs, our method can recover diverse knowledge represented in large language models.
arXiv Detail & Related papers (2022-12-07T18:17:56Z) - CONDAQA: A Contrastive Reading Comprehension Dataset for Reasoning about
Negation [21.56001677478673]
We present the first English reading comprehension dataset which requires reasoning about the implications of negated statements in paragraphs.
CONDAQA features 14,182 question-answer pairs with over 200 unique negation cues.
The best performing model on CONDAQA (UnifiedQA-v2-3b) achieves only 42% on our consistency metric, well below human performance which is 81%.
arXiv Detail & Related papers (2022-11-01T06:10:26Z) - Leveraging Affirmative Interpretations from Negation Improves Natural
Language Understanding [10.440501875161003]
Negation poses a challenge in many natural language understanding tasks.
We show that doing so benefits models for three natural language understanding tasks.
We build a plug-and-play neural generator that given a negated statement generates an affirmative interpretation.
arXiv Detail & Related papers (2022-10-26T05:22:27Z) - Not another Negation Benchmark: The NaN-NLI Test Suite for Sub-clausal
Negation [59.307534363825816]
Negation is poorly captured by current language models, although the extent of this problem is not widely understood.
We introduce a natural language inference (NLI) test suite to enable probing the capabilities of NLP methods.
arXiv Detail & Related papers (2022-10-06T23:39:01Z) - Improving negation detection with negation-focused pre-training [58.32362243122714]
Negation is a common linguistic feature that is crucial in many language understanding tasks.
Recent work has shown that state-of-the-art NLP models underperform on samples containing negation.
We propose a new negation-focused pre-training strategy, involving targeted data augmentation and negation masking.
arXiv Detail & Related papers (2022-05-09T02:41:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.