ANHALTEN: Cross-Lingual Transfer for German Token-Level Reference-Free Hallucination Detection
- URL: http://arxiv.org/abs/2407.13702v1
- Date: Thu, 18 Jul 2024 17:01:38 GMT
- Title: ANHALTEN: Cross-Lingual Transfer for German Token-Level Reference-Free Hallucination Detection
- Authors: Janek Herrlein, Chia-Chien Hung, Goran Glavaš,
- Abstract summary: We introduce ANHALTEN, a dataset that extends the English hallucination detection dataset to German.
This is the first work that explores cross-lingual transfer for token-level reference-free hallucination detection.
We show that the sample-efficient few-shot transfer is the most effective approach in most setups.
- Score: 3.1269598124014264
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Research on token-level reference-free hallucination detection has predominantly focused on English, primarily due to the scarcity of robust datasets in other languages. This has hindered systematic investigations into the effectiveness of cross-lingual transfer for this important NLP application. To address this gap, we introduce ANHALTEN, a new evaluation dataset that extends the English hallucination detection dataset to German. To the best of our knowledge, this is the first work that explores cross-lingual transfer for token-level reference-free hallucination detection. ANHALTEN contains gold annotations in German that are parallel (i.e., directly comparable to the original English instances). We benchmark several prominent cross-lingual transfer approaches, demonstrating that larger context length leads to better hallucination detection in German, even without succeeding context. Importantly, we show that the sample-efficient few-shot transfer is the most effective approach in most setups. This highlights the practical benefits of minimal annotation effort in the target language for reference-free hallucination detection. Aiming to catalyze future research on cross-lingual token-level reference-free hallucination detection, we make ANHALTEN publicly available: https://github.com/janekh24/anhalten
Related papers
- MetaToken: Detecting Hallucination in Image Descriptions by Meta Classification [1.3654846342364308]
We introduce MetaToken, a lightweight binary classifier to detect hallucinations on the token-level at negligible cost.
Based on a statistical analysis, we reveal key factors of hallucinations in LVLMs which have been overseen in previous works.
We evaluate our method on four state-of-the-art LVLMs demonstrating the effectiveness of our approach.
arXiv Detail & Related papers (2024-05-29T15:28:42Z) - Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback [48.065569871444275]
We propose detecting and mitigating hallucinations in Large Vision Language Models (LVLMs) via fine-grained AI feedback.
We generate a small-size hallucination annotation dataset by proprietary models.
Then, we propose a detect-then-rewrite pipeline to automatically construct preference dataset for training hallucination mitigating model.
arXiv Detail & Related papers (2024-04-22T14:46:10Z) - German also Hallucinates! Inconsistency Detection in News Summaries with the Absinth Dataset [3.5206745486062636]
This work presents absinth, a manually annotated dataset for hallucination detection in German news summarization.
We open-source and release the absinth dataset to foster further research on hallucination detection in German.
arXiv Detail & Related papers (2024-03-06T14:37:30Z) - Comparing Hallucination Detection Metrics for Multilingual Generation [62.97224994631494]
This paper assesses how well various factual hallucination detection metrics identify hallucinations in generated biographical summaries across languages.
We compare how well automatic metrics correlate to each other and whether they agree with human judgments of factuality.
Our analysis reveals that while the lexical metrics are ineffective, NLI-based metrics perform well, correlating with human annotations in many settings and often outperforming supervised models.
arXiv Detail & Related papers (2024-02-16T08:10:34Z) - OPERA: Alleviating Hallucination in Multi-Modal Large Language Models
via Over-Trust Penalty and Retrospection-Allocation [124.9008419182485]
We present OPERA, a novel MLLM decoding method grounded in an Over-trust Penalty and a Retrospection-Allocation strategy.
Our approach begins with an interesting observation that, most hallucinations are closely tied to the knowledge aggregation patterns in the self-attention matrix.
Based on the observation, OPERA introduces a penalty term on the model logits during the beam-search decoding to mitigate the over-trust issue.
arXiv Detail & Related papers (2023-11-29T18:57:07Z) - Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus [99.33091772494751]
Large Language Models (LLMs) have gained significant popularity for their impressive performance across diverse fields.
LLMs are prone to hallucinate untruthful or nonsensical outputs that fail to meet user expectations.
We propose a novel reference-free, uncertainty-based method for detecting hallucinations in LLMs.
arXiv Detail & Related papers (2023-11-22T08:39:17Z) - Detecting and Mitigating Hallucinations in Multilingual Summarisation [40.5267502712576]
Hallucinations pose a significant challenge to the reliability of neural models for abstractive summarisation.
We develop a novel metric, mFACT, evaluating the faithfulness of non-English summaries.
We then propose a simple but effective method to reduce hallucinations with a cross-lingual transfer.
arXiv Detail & Related papers (2023-05-23T02:59:25Z) - Looking for a Needle in a Haystack: A Comprehensive Study of
Hallucinations in Neural Machine Translation [17.102338932907294]
We set foundations for the study of NMT hallucinations.
We propose DeHallucinator, a simple method for alleviating hallucinations at test time.
arXiv Detail & Related papers (2022-08-10T12:44:13Z) - Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of
Code-Mixed Clinical Texts [56.72488923420374]
Pre-trained language models (LMs) have shown great potential for cross-lingual transfer in low-resource settings.
We show the few-shot cross-lingual transfer property of LMs for named recognition (NER) and apply it to solve a low-resource and real-world challenge of code-mixed (Spanish-Catalan) clinical notes de-identification in the stroke.
arXiv Detail & Related papers (2022-04-10T21:46:52Z) - Detecting Hallucinated Content in Conditional Neural Sequence Generation [165.68948078624499]
We propose a task to predict whether each token in the output sequence is hallucinated (not contained in the input)
We also introduce a method for learning to detect hallucinations using pretrained language models fine tuned on synthetic data.
arXiv Detail & Related papers (2020-11-05T00:18:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.