Detecting and Mitigating Hallucinations in Multilingual Summarisation
- URL: http://arxiv.org/abs/2305.13632v2
- Date: Thu, 26 Oct 2023 17:02:56 GMT
- Title: Detecting and Mitigating Hallucinations in Multilingual Summarisation
- Authors: Yifu Qiu, Yftah Ziser, Anna Korhonen, Edoardo M. Ponti, Shay B. Cohen
- Abstract summary: Hallucinations pose a significant challenge to the reliability of neural models for abstractive summarisation.
We develop a novel metric, mFACT, evaluating the faithfulness of non-English summaries.
We then propose a simple but effective method to reduce hallucinations with a cross-lingual transfer.
- Score: 40.5267502712576
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hallucinations pose a significant challenge to the reliability of neural
models for abstractive summarisation. While automatically generated summaries
may be fluent, they often lack faithfulness to the original document. This
issue becomes even more pronounced in low-resource settings, such as
cross-lingual transfer. With the existing faithful metrics focusing on English,
even measuring the extent of this phenomenon in cross-lingual settings is hard.
To address this, we first develop a novel metric, mFACT, evaluating the
faithfulness of non-English summaries, leveraging translation-based transfer
from multiple English faithfulness metrics. We then propose a simple but
effective method to reduce hallucinations with a cross-lingual transfer, which
weighs the loss of each training example by its faithfulness score. Through
extensive experiments in multiple languages, we demonstrate that mFACT is the
metric that is most suited to detect hallucinations. Moreover, we find that our
proposed loss weighting method drastically increases both performance and
faithfulness according to both automatic and human evaluation when compared to
strong baselines for cross-lingual transfer such as MAD-X. Our code and dataset
are available at https://github.com/yfqiu-nlp/mfact-summ.
Related papers
- Mitigating Multilingual Hallucination in Large Vision-Language Models [35.75851356840673]
We propose a two-stage Multilingual Hallucination Removal (MHR) framework for Large Vision-Language Models (LVLMs)
Instead of relying on the intricate manual annotations of multilingual resources, we propose a novel cross-lingual alignment method.
Our framework delivers an average increase of 19.0% in accuracy across 13 different languages.
arXiv Detail & Related papers (2024-08-01T13:34:35Z) - ANHALTEN: Cross-Lingual Transfer for German Token-Level Reference-Free Hallucination Detection [3.1269598124014264]
We introduce ANHALTEN, a dataset that extends the English hallucination detection dataset to German.
This is the first work that explores cross-lingual transfer for token-level reference-free hallucination detection.
We show that the sample-efficient few-shot transfer is the most effective approach in most setups.
arXiv Detail & Related papers (2024-07-18T17:01:38Z) - Comparing Hallucination Detection Metrics for Multilingual Generation [62.97224994631494]
This paper assesses how well various factual hallucination detection metrics identify hallucinations in generated biographical summaries across languages.
We compare how well automatic metrics correlate to each other and whether they agree with human judgments of factuality.
Our analysis reveals that while the lexical metrics are ineffective, NLI-based metrics perform well, correlating with human annotations in many settings and often outperforming supervised models.
arXiv Detail & Related papers (2024-02-16T08:10:34Z) - Mitigating Hallucinations and Off-target Machine Translation with
Source-Contrastive and Language-Contrastive Decoding [53.84948040596055]
We introduce two related methods to mitigate failure cases with a modified decoding objective.
Experiments on the massively multilingual models M2M-100 (418M) and SMaLL-100 show that these methods suppress hallucinations and off-target translations.
arXiv Detail & Related papers (2023-09-13T17:15:27Z) - mFACE: Multilingual Summarization with Factual Consistency Evaluation [79.60172087719356]
Abstractive summarization has enjoyed renewed interest in recent years, thanks to pre-trained language models and the availability of large-scale datasets.
Despite promising results, current models still suffer from generating factually inconsistent summaries.
We leverage factual consistency evaluation models to improve multilingual summarization.
arXiv Detail & Related papers (2022-12-20T19:52:41Z) - Exposing Cross-Lingual Lexical Knowledge from Multilingual Sentence
Encoders [85.80950708769923]
We probe multilingual language models for the amount of cross-lingual lexical knowledge stored in their parameters, and compare them against the original multilingual LMs.
We also devise a novel method to expose this knowledge by additionally fine-tuning multilingual models.
We report substantial gains on standard benchmarks.
arXiv Detail & Related papers (2022-04-30T13:23:16Z) - Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of
Code-Mixed Clinical Texts [56.72488923420374]
Pre-trained language models (LMs) have shown great potential for cross-lingual transfer in low-resource settings.
We show the few-shot cross-lingual transfer property of LMs for named recognition (NER) and apply it to solve a low-resource and real-world challenge of code-mixed (Spanish-Catalan) clinical notes de-identification in the stroke.
arXiv Detail & Related papers (2022-04-10T21:46:52Z) - Oolong: Investigating What Makes Transfer Learning Hard with Controlled
Studies [21.350999136803843]
We systematically transform the language of the GLUE benchmark, altering one axis of crosslingual variation at a time.
We find that models can largely recover from syntactic-style shifts, but cannot recover from vocabulary misalignment.
Our experiments provide insights into the factors of cross-lingual transfer that researchers should most focus on when designing language transfer scenarios.
arXiv Detail & Related papers (2022-02-24T19:00:39Z) - Analyzing Zero-shot Cross-lingual Transfer in Supervised NLP Tasks [6.7155846430379285]
In zero-shot cross-lingual transfer, a supervised NLP task trained on a corpus in one language is directly applicable to another language without any additional training.
Recently introduced cross-lingual language model (XLM) pretraining brings out neural parameter sharing in Transformer-style networks.
In this paper, we aim to validate the hypothetically strong cross-lingual transfer properties induced by XLM pretraining.
arXiv Detail & Related papers (2021-01-26T09:21:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.