CONCRETE: Improving Cross-lingual Fact-checking with Cross-lingual
Retrieval
- URL: http://arxiv.org/abs/2209.02071v1
- Date: Mon, 5 Sep 2022 17:36:14 GMT
- Title: CONCRETE: Improving Cross-lingual Fact-checking with Cross-lingual
Retrieval
- Authors: Kung-Hsiang Huang, ChengXiang Zhai, Heng Ji
- Abstract summary: Most fact-checking approaches focus on English only due to the data scarcity issue in other languages.
We present the first fact-checking framework augmented with crosslingual retrieval.
We train the retriever with our proposed Crosslingual Inverse Cloze Task (XICT)
- Score: 73.48591773882052
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fact-checking has gained increasing attention due to the widespread of
falsified information. Most fact-checking approaches focus on claims made in
English only due to the data scarcity issue in other languages. The lack of
fact-checking datasets in low-resource languages calls for an effective
cross-lingual transfer technique for fact-checking. Additionally, trustworthy
information in different languages can be complementary and helpful in
verifying facts. To this end, we present the first fact-checking framework
augmented with cross-lingual retrieval that aggregates evidence retrieved from
multiple languages through a cross-lingual retriever. Given the absence of
cross-lingual information retrieval datasets with claim-like queries, we train
the retriever with our proposed Cross-lingual Inverse Cloze Task (X-ICT), a
self-supervised algorithm that creates training instances by translating the
title of a passage. The goal for X-ICT is to learn cross-lingual retrieval in
which the model learns to identify the passage corresponding to a given
translated title. On the X-Fact dataset, our approach achieves 2.23% absolute
F1 improvement in the zero-shot cross-lingual setup over prior systems. The
source code and data are publicly available at
https://github.com/khuangaf/CONCRETE.
Related papers
- Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models [62.91524967852552]
Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora.
But can these models relate corresponding concepts across languages, effectively being crosslingual?
This study evaluates six state-of-the-art LLMs on inherently crosslingual tasks.
arXiv Detail & Related papers (2024-06-23T15:15:17Z) - Cross-lingual Contextualized Phrase Retrieval [63.80154430930898]
We propose a new task formulation of dense retrieval, cross-lingual contextualized phrase retrieval.
We train our Cross-lingual Contextualized Phrase Retriever (CCPR) using contrastive learning.
On the phrase retrieval task, CCPR surpasses baselines by a significant margin, achieving a top-1 accuracy that is at least 13 points higher.
arXiv Detail & Related papers (2024-03-25T14:46:51Z) - Self-Augmentation Improves Zero-Shot Cross-Lingual Transfer [92.80671770992572]
Cross-lingual transfer is a central task in multilingual NLP.
Earlier efforts on this task use parallel corpora, bilingual dictionaries, or other annotated alignment data.
We propose a simple yet effective method, SALT, to improve the zero-shot cross-lingual transfer.
arXiv Detail & Related papers (2023-09-19T19:30:56Z) - Measuring Catastrophic Forgetting in Cross-Lingual Transfer Paradigms: Exploring Tuning Strategies [4.118037156777793]
Cross-lingual transfer is a promising technique to solve tasks in less-resourced languages.
We compare two fine-tuning approaches combined with zero-shot and full-shot learning approaches for large language models.
arXiv Detail & Related papers (2023-09-12T09:37:08Z) - Breaking Language Barriers with MMTweets: Advancing Cross-Lingual Debunked Narrative Retrieval for Fact-Checking [5.880794128275313]
Cross-lingual debunked narrative retrieval is an understudied problem.
This study introduces cross-lingual debunked narrative retrieval and addresses this research gap by: (i) creating Multilingual Misinformation Tweets (MMTweets)
MMTweets features cross-lingual pairs, images, human annotations, and fine-grained labels, making it a comprehensive resource compared to its counterparts.
We find that MMTweets presents challenges for cross-lingual debunked narrative retrieval, highlighting areas for improvement in retrieval models.
arXiv Detail & Related papers (2023-08-10T16:33:17Z) - Model and Data Transfer for Cross-Lingual Sequence Labelling in
Zero-Resource Settings [10.871587311621974]
We experimentally demonstrate that high capacity multilingual language models applied in a zero-shot setting consistently outperform data-based cross-lingual transfer approaches.
A detailed analysis of our results suggests that this might be due to important differences in language use.
Our results also indicate that data-based cross-lingual transfer approaches remain a competitive option when high-capacity multilingual language models are not available.
arXiv Detail & Related papers (2022-10-23T05:37:35Z) - Matching Tweets With Applicable Fact-Checks Across Languages [27.762055254009017]
We focus on automatically finding existing fact-checks for claims made in social media posts (tweets)
We conduct both classification and retrieval experiments, in monolingual (English only), multilingual (Spanish, Portuguese), and cross-lingual (Hindi-English) settings.
We present promising results for "match" classification (93% average accuracy) in four language pairs.
arXiv Detail & Related papers (2022-02-14T23:33:02Z) - Multilingual Transfer Learning for QA Using Translation as Data
Augmentation [13.434957024596898]
We explore strategies that improve cross-lingual transfer by bringing the multilingual embeddings closer in the semantic space.
We propose two novel strategies, language adversarial training and language arbitration framework, which significantly improve the (zero-resource) cross-lingual transfer performance.
Empirically, we show that the proposed models outperform the previous zero-shot baseline on the recently introduced multilingual MLQA and TyDiQA datasets.
arXiv Detail & Related papers (2020-12-10T20:29:34Z) - Cross-lingual Machine Reading Comprehension with Language Branch
Knowledge Distillation [105.41167108465085]
Cross-lingual Machine Reading (CLMRC) remains a challenging problem due to the lack of large-scale datasets in low-source languages.
We propose a novel augmentation approach named Language Branch Machine Reading (LBMRC)
LBMRC trains multiple machine reading comprehension (MRC) models proficient in individual language.
We devise a multilingual distillation approach to amalgamate knowledge from multiple language branch models to a single model for all target languages.
arXiv Detail & Related papers (2020-10-27T13:12:17Z) - FILTER: An Enhanced Fusion Method for Cross-lingual Language
Understanding [85.29270319872597]
We propose an enhanced fusion method that takes cross-lingual data as input for XLM finetuning.
During inference, the model makes predictions based on the text input in the target language and its translation in the source language.
To tackle this issue, we propose an additional KL-divergence self-teaching loss for model training, based on auto-generated soft pseudo-labels for translated text in the target language.
arXiv Detail & Related papers (2020-09-10T22:42:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.