MALTO at SemEval-2024 Task 6: Leveraging Synthetic Data for LLM
Hallucination Detection
- URL: http://arxiv.org/abs/2403.00964v1
- Date: Fri, 1 Mar 2024 20:31:10 GMT
- Title: MALTO at SemEval-2024 Task 6: Leveraging Synthetic Data for LLM
Hallucination Detection
- Authors: Federico Borra, Claudio Savelli, Giacomo Rosso, Alkis Koudounas,
Flavio Giobergia
- Abstract summary: In Natural Language Generation (NLG), contemporary Large Language Models (LLMs) face several challenges.
This often leads to neural networks exhibiting "hallucinations"
The SHROOM challenge focuses on automatically identifying these hallucinations in the generated text.
- Score: 3.049887057143419
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In Natural Language Generation (NLG), contemporary Large Language Models
(LLMs) face several challenges, such as generating fluent yet inaccurate
outputs and reliance on fluency-centric metrics. This often leads to neural
networks exhibiting "hallucinations". The SHROOM challenge focuses on
automatically identifying these hallucinations in the generated text. To tackle
these issues, we introduce two key components, a data augmentation pipeline
incorporating LLM-assisted pseudo-labelling and sentence rephrasing, and a
voting ensemble from three models pre-trained on Natural Language Inference
(NLI) tasks and fine-tuned on diverse datasets.
Related papers
- Mitigating Hallucinations in Large Vision-Language Models via Summary-Guided Decoding [14.701135083174918]
Large Vision-Language Models (LVLMs) generate detailed and coherent responses from visual inputs.
They are prone to generate hallucinations due to an over-reliance on language priors.
We propose a novel method, Summary-Guided Decoding (SGD)
arXiv Detail & Related papers (2024-10-17T08:24:27Z) - Controlled Automatic Task-Specific Synthetic Data Generation for Hallucination Detection [7.167234584287035]
We present a novel approach to automatically generate non-trivial task-specific synthetic datasets for hallucination detection.
Our approach features a two-step generation-selection pipeline, using hallucination pattern guidance and a language style alignment during generation.
Our hallucination detectors trained on synthetic datasets outperform in-context-learning (ICL)-based detectors by a large margin of 32%.
arXiv Detail & Related papers (2024-10-16T06:31:59Z) - LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language Models [96.64960606650115]
LongHalQA is an LLM-free hallucination benchmark that comprises 6K long and complex hallucination text.
LongHalQA is featured by GPT4V-generated hallucinatory data that are well aligned with real-world scenarios.
arXiv Detail & Related papers (2024-10-13T18:59:58Z) - Negation Blindness in Large Language Models: Unveiling the NO Syndrome in Image Generation [63.064204206220936]
Foundational Large Language Models (LLMs) have changed the way we perceive technology.
They have been shown to excel in tasks ranging from poem writing to coding to essay generation and puzzle solving.
With the incorporation of image generation capability, they have become more comprehensive and versatile AI tools.
Currently identified flaws include hallucination, biases, and bypassing restricted commands to generate harmful content.
arXiv Detail & Related papers (2024-08-27T14:40:16Z) - Hallucination Detection: Robustly Discerning Reliable Answers in Large Language Models [70.19081534515371]
Large Language Models (LLMs) have gained widespread adoption in various natural language processing tasks.
They generate unfaithful or inconsistent content that deviates from the input source, leading to severe consequences.
We propose a robust discriminator named RelD to effectively detect hallucination in LLMs' generated answers.
arXiv Detail & Related papers (2024-07-04T18:47:42Z) - Drowzee: Metamorphic Testing for Fact-Conflicting Hallucination Detection in Large Language Models [11.138489774712163]
We propose an innovative approach leveraging logic programming to enhance metamorphic testing for detecting Fact-Conflicting Hallucinations (FCH)
Our method generates test cases and detects hallucinations across six different large language models spanning nine domains, revealing rates ranging from 24.7% to 59.8%.
arXiv Detail & Related papers (2024-05-01T17:24:42Z) - Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus [99.33091772494751]
Large Language Models (LLMs) have gained significant popularity for their impressive performance across diverse fields.
LLMs are prone to hallucinate untruthful or nonsensical outputs that fail to meet user expectations.
We propose a novel reference-free, uncertainty-based method for detecting hallucinations in LLMs.
arXiv Detail & Related papers (2023-11-22T08:39:17Z) - AutoHall: Automated Hallucination Dataset Generation for Large Language Models [56.92068213969036]
This paper introduces a method for automatically constructing model-specific hallucination datasets based on existing fact-checking datasets called AutoHall.
We also propose a zero-resource and black-box hallucination detection method based on self-contradiction.
arXiv Detail & Related papers (2023-09-30T05:20:02Z) - Survey of Hallucination in Natural Language Generation [69.9926849848132]
Natural Language Generation (NLG) has improved exponentially in recent years thanks to the development of sequence-to-sequence deep learning technologies.
Deep learning based generation is prone to hallucinate unintended text, which degrades the system performance.
This survey serves to facilitate collaborative efforts among researchers in tackling the challenge of hallucinated texts in NLG.
arXiv Detail & Related papers (2022-02-08T03:55:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.