Related papers: TIFIN India at SemEval-2025: Harnessing Translation to Overcome Multilingual IR Challenges in Fact-Checked Claim Retrieval

TIFIN India at SemEval-2025: Harnessing Translation to Overcome Multilingual IR Challenges in Fact-Checked Claim Retrieval

URL: http://arxiv.org/abs/2504.16627v1
Date: Wed, 23 Apr 2025 11:34:35 GMT
Title: TIFIN India at SemEval-2025: Harnessing Translation to Overcome Multilingual IR Challenges in Fact-Checked Claim Retrieval
Authors: Prasanna Devadiga, Arya Suneesh, Pawan Kumar Rajpoot, Bharatdeep Hazarika, Aditya U Baliga,
Abstract summary: We address the challenge of retrieving previously fact-checked claims in monolingual and crosslingual settings.<n>Our approach follows a two-stage strategy: a reliable baseline retrieval system using a fine-tuned embedding model and an LLM-based reranker.
Score: 0.10417205448468168
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We address the challenge of retrieving previously fact-checked claims in monolingual and crosslingual settings - a critical task given the global prevalence of disinformation. Our approach follows a two-stage strategy: a reliable baseline retrieval system using a fine-tuned embedding model and an LLM-based reranker. Our key contribution is demonstrating how LLM-based translation can overcome the hurdles of multilingual information retrieval. Additionally, we focus on ensuring that the bulk of the pipeline can be replicated on a consumer GPU. Our final integrated system achieved a success@10 score of 0.938 and 0.81025 on the monolingual and crosslingual test sets, respectively.

Related papers

Multilingual vs Crosslingual Retrieval of Fact-Checked Claims: A Tale of Two Approaches [5.850200023135349]
We examine strategies to improve the multilingual and crosslingual performance.<n>We evaluate approaches on a dataset containing posts and claims in 47 languages.<n>Most importantly, we show that crosslinguality is a setup with its own unique characteristics compared to the multilingual setup.
arXiv Detail & Related papers (2025-05-28T08:47:10Z)
Understanding LLMs' Cross-Lingual Context Retrieval: How Good It Is And Where It Comes From [61.63091726904068]
We evaluate the cross-lingual context retrieval ability of over 40 large language models (LLMs) across 12 languages.<n>Several small, post-trained open LLMs show strong cross-lingual context retrieval ability.<n>Our results also indicate that larger-scale pretraining cannot improve the xMRC performance.
arXiv Detail & Related papers (2025-04-15T06:35:27Z)
Word2winners at SemEval-2025 Task 7: Multilingual and Crosslingual Fact-Checked Claim Retrieval [0.7874708385247352]
This paper describes our system for SemEval 2025 Task 7: Previously Fact-Checked Claim Retrieval.<n>The task requires retrieving relevant fact-checks for a given input claim from the extensive, multilingual MultiClaim dataset.<n>Our best model achieved an accuracy of 85% on crosslingual data and 92% on monolingual data.
arXiv Detail & Related papers (2025-03-12T02:59:41Z)
AILS-NTUA at SemEval-2025 Task 3: Leveraging Large Language Models and Translation Strategies for Multilingual Hallucination Detection [4.8858843645116945]
We propose an efficient, training-free LLM prompting strategy that enhances hallucination detection by translating multilingual text spans into English.<n>Our approach achieves competitive rankings across multiple languages, securing two first positions in low-resource languages.
arXiv Detail & Related papers (2025-03-04T09:38:57Z)
Demystifying Multilingual Chain-of-Thought in Process Reward Modeling [71.12193680015622]
We tackle the challenge of extending process reward models (PRMs) to multilingual settings.<n>We train multilingual PRMs on a dataset spanning seven languages, which is translated from English.<n>Our results highlight the sensitivity of multilingual PRMs to both the number of training languages and the volume of English data.
arXiv Detail & Related papers (2025-02-18T09:11:44Z)
Franken-Adapter: Cross-Lingual Adaptation of LLMs by Embedding Surgery [31.516243610548635]
We present $textitFranken-Adapter$, a modular language adaptation approach for decoder-only Large Language Models.<n>Our method begins by creating customized vocabularies for target languages and performing language adaptation through embedding tuning on multilingual data.<n>Experiments on $ttGemma2$ models with up to 27B parameters demonstrate improvements of up to 20% across 96 languages, spanning both discriminative and generative tasks.
arXiv Detail & Related papers (2025-02-12T00:38:11Z)
USTCCTSU at SemEval-2024 Task 1: Reducing Anisotropy for Cross-lingual Semantic Textual Relatedness Task [17.905282052666333]
Cross-lingual semantic textual relatedness task is an important research task that addresses challenges in cross-lingual communication and text understanding.<n>It helps establish semantic connections between different languages, crucial for downstream tasks like machine translation, multilingual information retrieval, and cross-lingual text understanding.<n>With our approach, we achieve a 2nd score in Spanish, a 3rd in Indonesian, and multiple entries in the top ten results in the competition's track C.
arXiv Detail & Related papers (2024-11-28T08:40:14Z)
CONCRETE: Improving Cross-lingual Fact-checking with Cross-lingual Retrieval [73.48591773882052]
Most fact-checking approaches focus on English only due to the data scarcity issue in other languages. We present the first fact-checking framework augmented with crosslingual retrieval. We train the retriever with our proposed Crosslingual Inverse Cloze Task (XICT)
arXiv Detail & Related papers (2022-09-05T17:36:14Z)
Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of Code-Mixed Clinical Texts [56.72488923420374]
Pre-trained language models (LMs) have shown great potential for cross-lingual transfer in low-resource settings. We show the few-shot cross-lingual transfer property of LMs for named recognition (NER) and apply it to solve a low-resource and real-world challenge of code-mixed (Spanish-Catalan) clinical notes de-identification in the stroke.
arXiv Detail & Related papers (2022-04-10T21:46:52Z)
Matching Tweets With Applicable Fact-Checks Across Languages [27.762055254009017]
We focus on automatically finding existing fact-checks for claims made in social media posts (tweets) We conduct both classification and retrieval experiments, in monolingual (English only), multilingual (Spanish, Portuguese), and cross-lingual (Hindi-English) settings. We present promising results for "match" classification (93% average accuracy) in four language pairs.
arXiv Detail & Related papers (2022-02-14T23:33:02Z)
Towards Lifelong Learning of Multilingual Text-To-Speech Synthesis [87.75833205560406]
This work presents a lifelong learning approach to train a multilingual Text-To-Speech (TTS) system. It does not require pooled data from all languages altogether, and thus alleviates the storage and computation burden.
arXiv Detail & Related papers (2021-10-09T07:00:38Z)
Mixed-Lingual Pre-training for Cross-lingual Summarization [54.4823498438831]
Cross-lingual Summarization aims at producing a summary in the target language for an article in the source language. We propose a solution based on mixed-lingual pre-training that leverages both cross-lingual tasks like translation and monolingual tasks like masked language models. Our model achieves an improvement of 2.82 (English to Chinese) and 1.15 (Chinese to English) ROUGE-1 scores over state-of-the-art results.
arXiv Detail & Related papers (2020-10-18T00:21:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.