Finding Already Debunked Narratives via Multistage Retrieval: Enabling
Cross-Lingual, Cross-Dataset and Zero-Shot Learning
- URL: http://arxiv.org/abs/2308.05680v1
- Date: Thu, 10 Aug 2023 16:33:17 GMT
- Title: Finding Already Debunked Narratives via Multistage Retrieval: Enabling
Cross-Lingual, Cross-Dataset and Zero-Shot Learning
- Authors: Iknoor Singh, Carolina Scarton, Xingyi Song, Kalina Bontcheva
- Abstract summary: This paper creates a novel dataset to enable research on cross-lingual retrieval of debunked narratives.
It presents an experiment to benchmark fine-tuned and off-the-shelf multilingual pre-trained Transformer models for this task.
It also proposes a novel multistage framework that divides this cross-lingual debunk retrieval task into refinement and re-ranking stages.
- Score: 6.094795148759833
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The task of retrieving already debunked narratives aims to detect stories
that have already been fact-checked. The successful detection of claims that
have already been debunked not only reduces the manual efforts of professional
fact-checkers but can also contribute to slowing the spread of misinformation.
Mainly due to the lack of readily available data, this is an understudied
problem, particularly when considering the cross-lingual task, i.e. the
retrieval of fact-checking articles in a language different from the language
of the online post being checked. This paper fills this gap by (i) creating a
novel dataset to enable research on cross-lingual retrieval of already debunked
narratives, using tweets as queries to a database of fact-checking articles;
(ii) presenting an extensive experiment to benchmark fine-tuned and
off-the-shelf multilingual pre-trained Transformer models for this task; and
(iii) proposing a novel multistage framework that divides this cross-lingual
debunk retrieval task into refinement and re-ranking stages. Results show that
the task of cross-lingual retrieval of already debunked narratives is
challenging and off-the-shelf Transformer models fail to outperform a strong
lexical-based baseline (BM25). Nevertheless, our multistage retrieval framework
is robust, outperforming BM25 in most scenarios and enabling cross-domain and
zero-shot learning, without significantly harming the model's performance.
Related papers
- Argument Mining in Data Scarce Settings: Cross-lingual Transfer and Few-shot Techniques [5.735035463793008]
We show that for Argument Mining, data transfer obtains better results than model-transfer.
For few-shot, the type of task (length and complexity of the sequence spans) and sampling method prove to be crucial.
arXiv Detail & Related papers (2024-07-04T08:59:17Z) - Cross-lingual Contextualized Phrase Retrieval [63.80154430930898]
We propose a new task formulation of dense retrieval, cross-lingual contextualized phrase retrieval.
We train our Cross-lingual Contextualized Phrase Retriever (CCPR) using contrastive learning.
On the phrase retrieval task, CCPR surpasses baselines by a significant margin, achieving a top-1 accuracy that is at least 13 points higher.
arXiv Detail & Related papers (2024-03-25T14:46:51Z) - Ensemble Transfer Learning for Multilingual Coreference Resolution [60.409789753164944]
A problem that frequently occurs when working with a non-English language is the scarcity of annotated training data.
We design a simple but effective ensemble-based framework that combines various transfer learning techniques.
We also propose a low-cost TL method that bootstraps coreference resolution models by utilizing Wikipedia anchor texts.
arXiv Detail & Related papers (2023-01-22T18:22:55Z) - Understanding Translationese in Cross-Lingual Summarization [106.69566000567598]
Cross-lingual summarization (MS) aims at generating a concise summary in a different target language.
To collect large-scale CLS data, existing datasets typically involve translation in their creation.
In this paper, we first confirm that different approaches of constructing CLS datasets will lead to different degrees of translationese.
arXiv Detail & Related papers (2022-12-14T13:41:49Z) - CONCRETE: Improving Cross-lingual Fact-checking with Cross-lingual
Retrieval [73.48591773882052]
Most fact-checking approaches focus on English only due to the data scarcity issue in other languages.
We present the first fact-checking framework augmented with crosslingual retrieval.
We train the retriever with our proposed Crosslingual Inverse Cloze Task (XICT)
arXiv Detail & Related papers (2022-09-05T17:36:14Z) - X-SCITLDR: Cross-Lingual Extreme Summarization of Scholarly Documents [12.493662336994106]
We present an abstractive cross-lingual summarization dataset for four different languages in the scholarly domain.
We train and evaluate models that process English papers and generate summaries in German, Italian, Chinese and Japanese.
arXiv Detail & Related papers (2022-05-30T12:31:28Z) - Bridging Cross-Lingual Gaps During Leveraging the Multilingual
Sequence-to-Sequence Pretraining for Text Generation [80.16548523140025]
We extend the vanilla pretrain-finetune pipeline with extra code-switching restore task to bridge the gap between the pretrain and finetune stages.
Our approach could narrow the cross-lingual sentence representation distance and improve low-frequency word translation with trivial computational cost.
arXiv Detail & Related papers (2022-04-16T16:08:38Z) - Matching Tweets With Applicable Fact-Checks Across Languages [27.762055254009017]
We focus on automatically finding existing fact-checks for claims made in social media posts (tweets)
We conduct both classification and retrieval experiments, in monolingual (English only), multilingual (Spanish, Portuguese), and cross-lingual (Hindi-English) settings.
We present promising results for "match" classification (93% average accuracy) in four language pairs.
arXiv Detail & Related papers (2022-02-14T23:33:02Z) - IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and
Languages [87.5457337866383]
We introduce the Image-Grounded Language Understanding Evaluation benchmark.
IGLUE brings together visual question answering, cross-modal retrieval, grounded reasoning, and grounded entailment tasks across 20 diverse languages.
We find that translate-test transfer is superior to zero-shot transfer and that few-shot learning is hard to harness for many tasks.
arXiv Detail & Related papers (2022-01-27T18:53:22Z) - A Deep Reinforced Model for Zero-Shot Cross-Lingual Summarization with
Bilingual Semantic Similarity Rewards [40.17497211507507]
Cross-lingual text summarization is a practically important but under-explored task.
We propose an end-to-end cross-lingual text summarization model.
arXiv Detail & Related papers (2020-06-27T21:51:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.