SemEval-2025 Task 7: Multilingual and Crosslingual Fact-Checked Claim Retrieval
- URL: http://arxiv.org/abs/2505.10740v1
- Date: Thu, 15 May 2025 23:04:46 GMT
- Title: SemEval-2025 Task 7: Multilingual and Crosslingual Fact-Checked Claim Retrieval
- Authors: Qiwei Peng, Robert Moro, Michal Gregor, Ivan Srba, Simon Ostermann, Marian Simko, Juraj Podroužek, Matúš Mesarčík, Jaroslav Kopčan, Anders Søgaard,
- Abstract summary: The rapid spread of online disinformation presents a global challenge, and machine learning has been widely explored as a potential solution.<n>To address this gap, we conducted a shared task on multilingual claim retrieval at SemEval 2025.<n>We report the best-performing systems as well as the most common and the most effective approaches across both subtracks.
- Score: 29.85035370846946
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The rapid spread of online disinformation presents a global challenge, and machine learning has been widely explored as a potential solution. However, multilingual settings and low-resource languages are often neglected in this field. To address this gap, we conducted a shared task on multilingual claim retrieval at SemEval 2025, aimed at identifying fact-checked claims that match newly encountered claims expressed in social media posts across different languages. The task includes two subtracks: (1) a monolingual track, where social posts and claims are in the same language, and (2) a crosslingual track, where social posts and claims might be in different languages. A total of 179 participants registered for the task contributing to 52 test submissions. 23 out of 31 teams have submitted their system papers. In this paper, we report the best-performing systems as well as the most common and the most effective approaches across both subtracks. This shared task, along with its dataset and participating systems, provides valuable insights into multilingual claim retrieval and automated fact-checking, supporting future research in this field.
Related papers
- Multilingual vs Crosslingual Retrieval of Fact-Checked Claims: A Tale of Two Approaches [5.850200023135349]
We examine strategies to improve the multilingual and crosslingual performance.<n>We evaluate approaches on a dataset containing posts and claims in 47 languages.<n>Most importantly, we show that crosslinguality is a setup with its own unique characteristics compared to the multilingual setup.
arXiv Detail & Related papers (2025-05-28T08:47:10Z) - Multilingual Retrieval-Augmented Generation for Knowledge-Intensive Task [73.35882908048423]
Retrieval-augmented generation (RAG) has become a cornerstone of contemporary NLP.<n>This paper investigates the effectiveness of RAG across multiple languages by proposing novel approaches for multilingual open-domain question-answering.
arXiv Detail & Related papers (2025-04-04T17:35:43Z) - Entity-aware Cross-lingual Claim Detection for Automated Fact-checking [7.242609314791262]
We introduce EX-Claim, an entity-aware cross-lingual claim detection model that generalizes well to handle claims written in any language.<n>Our proposed model significantly outperforms the baselines, across 27 languages, and achieves the highest rate of knowledge transfer, even with limited training data.
arXiv Detail & Related papers (2025-03-19T14:00:55Z) - Word2winners at SemEval-2025 Task 7: Multilingual and Crosslingual Fact-Checked Claim Retrieval [0.7874708385247352]
This paper describes our system for SemEval 2025 Task 7: Previously Fact-Checked Claim Retrieval.<n>The task requires retrieving relevant fact-checks for a given input claim from the extensive, multilingual MultiClaim dataset.<n>Our best model achieved an accuracy of 85% on crosslingual data and 92% on monolingual data.
arXiv Detail & Related papers (2025-03-12T02:59:41Z) - GenAI Content Detection Task 1: English and Multilingual Machine-Generated Text Detection: AI vs. Human [71.42669028683741]
We present a shared task on binary machine generated text detection conducted as a part of the GenAI workshop at COLING 2025.<n>The task consists of two subtasks: Monolingual (English) and Multilingual.<n>We provide a comprehensive overview of the data, a summary of the results, detailed descriptions of the participating systems, and an in-depth analysis of submissions.
arXiv Detail & Related papers (2025-01-19T11:11:55Z) - Findings of the IWSLT 2024 Evaluation Campaign [102.7608597658451]
The paper reports on the shared tasks organized by the 21st IWSLT Conference.
The shared tasks address 7 scientific challenges in spoken language translation.
arXiv Detail & Related papers (2024-11-07T19:11:55Z) - Summary of the DISPLACE Challenge 2023 -- DIarization of SPeaker and
LAnguage in Conversational Environments [28.618333018398122]
In multi-lingual societies, where multiple languages are spoken in a small geographic vicinity, informal conversations often involve mix of languages.
Existing speech technologies may be inefficient in extracting information from such conversations, where the speech data is rich in diversity with multiple languages and speakers.
The DISPLACE challenge constitutes an open-call for evaluating and bench-marking the speaker and language diarization technologies on this challenging condition.
arXiv Detail & Related papers (2023-11-21T12:23:58Z) - Multi-EuP: The Multilingual European Parliament Dataset for Analysis of
Bias in Information Retrieval [62.82448161570428]
This dataset is designed to investigate fairness in a multilingual information retrieval context.
It boasts an authentic multilingual corpus, featuring topics translated into all 24 languages.
It offers rich demographic information associated with its documents, facilitating the study of demographic bias.
arXiv Detail & Related papers (2023-11-03T12:29:11Z) - Bridging Cross-Lingual Gaps During Leveraging the Multilingual
Sequence-to-Sequence Pretraining for Text Generation [80.16548523140025]
We extend the vanilla pretrain-finetune pipeline with extra code-switching restore task to bridge the gap between the pretrain and finetune stages.
Our approach could narrow the cross-lingual sentence representation distance and improve low-frequency word translation with trivial computational cost.
arXiv Detail & Related papers (2022-04-16T16:08:38Z) - Matching Tweets With Applicable Fact-Checks Across Languages [27.762055254009017]
We focus on automatically finding existing fact-checks for claims made in social media posts (tweets)
We conduct both classification and retrieval experiments, in monolingual (English only), multilingual (Spanish, Portuguese), and cross-lingual (Hindi-English) settings.
We present promising results for "match" classification (93% average accuracy) in four language pairs.
arXiv Detail & Related papers (2022-02-14T23:33:02Z) - UPV at CheckThat! 2021: Mitigating Cultural Differences for Identifying
Multilingual Check-worthy Claims [6.167830237917659]
In this paper, we propose a language identification task as an auxiliary task to mitigate unintended bias.
Our results show that joint training of language identification and check-worthy claim detection tasks can provide performance gains for some of the selected languages.
arXiv Detail & Related papers (2021-09-19T21:46:16Z) - XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating
Cross-lingual Generalization [128.37244072182506]
Cross-lingual TRansfer Evaluation of Multilinguals XTREME is a benchmark for evaluating the cross-lingual generalization capabilities of multilingual representations across 40 languages and 9 tasks.
We demonstrate that while models tested on English reach human performance on many tasks, there is still a sizable gap in the performance of cross-lingually transferred models.
arXiv Detail & Related papers (2020-03-24T19:09:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.