TIFIN India at SemEval-2025: Harnessing Translation to Overcome Multilingual IR Challenges in Fact-Checked Claim Retrieval
- URL: http://arxiv.org/abs/2504.16627v1
- Date: Wed, 23 Apr 2025 11:34:35 GMT
- Title: TIFIN India at SemEval-2025: Harnessing Translation to Overcome Multilingual IR Challenges in Fact-Checked Claim Retrieval
- Authors: Prasanna Devadiga, Arya Suneesh, Pawan Kumar Rajpoot, Bharatdeep Hazarika, Aditya U Baliga,
- Abstract summary: We address the challenge of retrieving previously fact-checked claims in monolingual and crosslingual settings.<n>Our approach follows a two-stage strategy: a reliable baseline retrieval system using a fine-tuned embedding model and an LLM-based reranker.
- Score: 0.10417205448468168
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We address the challenge of retrieving previously fact-checked claims in monolingual and crosslingual settings - a critical task given the global prevalence of disinformation. Our approach follows a two-stage strategy: a reliable baseline retrieval system using a fine-tuned embedding model and an LLM-based reranker. Our key contribution is demonstrating how LLM-based translation can overcome the hurdles of multilingual information retrieval. Additionally, we focus on ensuring that the bulk of the pipeline can be replicated on a consumer GPU. Our final integrated system achieved a success@10 score of 0.938 and 0.81025 on the monolingual and crosslingual test sets, respectively.
Related papers
- Understanding LLMs' Cross-Lingual Context Retrieval: How Good It Is And Where It Comes From [61.63091726904068]
We evaluate the cross-lingual context retrieval ability of over 40 large language models (LLMs) across 12 languages.<n>Several small, post-trained open LLMs show strong cross-lingual context retrieval ability.<n>Our results also indicate that larger-scale pretraining cannot improve the xMRC performance.
arXiv Detail & Related papers (2025-04-15T06:35:27Z) - Word2winners at SemEval-2025 Task 7: Multilingual and Crosslingual Fact-Checked Claim Retrieval [0.7874708385247352]
This paper describes our system for SemEval 2025 Task 7: Previously Fact-Checked Claim Retrieval.<n>The task requires retrieving relevant fact-checks for a given input claim from the extensive, multilingual MultiClaim dataset.<n>Our best model achieved an accuracy of 85% on crosslingual data and 92% on monolingual data.
arXiv Detail & Related papers (2025-03-12T02:59:41Z) - AILS-NTUA at SemEval-2025 Task 3: Leveraging Large Language Models and Translation Strategies for Multilingual Hallucination Detection [4.8858843645116945]
We propose an efficient, training-free LLM prompting strategy that enhances hallucination detection by translating multilingual text spans into English.<n>Our approach achieves competitive rankings across multiple languages, securing two first positions in low-resource languages.
arXiv Detail & Related papers (2025-03-04T09:38:57Z) - Demystifying Multilingual Chain-of-Thought in Process Reward Modeling [71.12193680015622]
We tackle the challenge of extending process reward models (PRMs) to multilingual settings.<n>We train multilingual PRMs on a dataset spanning seven languages, which is translated from English.<n>Our results highlight the sensitivity of multilingual PRMs to both the number of training languages and the volume of English data.
arXiv Detail & Related papers (2025-02-18T09:11:44Z) - Franken-Adapter: Cross-Lingual Adaptation of LLMs by Embedding Surgery [31.516243610548635]
We present $textitFranken-Adapter$, a modular language adaptation approach for decoder-only Large Language Models.<n>Our method begins by creating customized vocabularies for target languages and performing language adaptation through embedding tuning on multilingual data.<n>Experiments on $ttGemma2$ models with up to 27B parameters demonstrate improvements of up to 20% across 96 languages, spanning both discriminative and generative tasks.
arXiv Detail & Related papers (2025-02-12T00:38:11Z) - USTCCTSU at SemEval-2024 Task 1: Reducing Anisotropy for Cross-lingual Semantic Textual Relatedness Task [17.905282052666333]
Cross-lingual semantic textual relatedness task is an important research task that addresses challenges in cross-lingual communication and text understanding.<n>It helps establish semantic connections between different languages, crucial for downstream tasks like machine translation, multilingual information retrieval, and cross-lingual text understanding.<n>With our approach, we achieve a 2nd score in Spanish, a 3rd in Indonesian, and multiple entries in the top ten results in the competition's track C.
arXiv Detail & Related papers (2024-11-28T08:40:14Z) - CONCRETE: Improving Cross-lingual Fact-checking with Cross-lingual
Retrieval [73.48591773882052]
Most fact-checking approaches focus on English only due to the data scarcity issue in other languages.
We present the first fact-checking framework augmented with crosslingual retrieval.
We train the retriever with our proposed Crosslingual Inverse Cloze Task (XICT)
arXiv Detail & Related papers (2022-09-05T17:36:14Z) - Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of
Code-Mixed Clinical Texts [56.72488923420374]
Pre-trained language models (LMs) have shown great potential for cross-lingual transfer in low-resource settings.
We show the few-shot cross-lingual transfer property of LMs for named recognition (NER) and apply it to solve a low-resource and real-world challenge of code-mixed (Spanish-Catalan) clinical notes de-identification in the stroke.
arXiv Detail & Related papers (2022-04-10T21:46:52Z) - Matching Tweets With Applicable Fact-Checks Across Languages [27.762055254009017]
We focus on automatically finding existing fact-checks for claims made in social media posts (tweets)
We conduct both classification and retrieval experiments, in monolingual (English only), multilingual (Spanish, Portuguese), and cross-lingual (Hindi-English) settings.
We present promising results for "match" classification (93% average accuracy) in four language pairs.
arXiv Detail & Related papers (2022-02-14T23:33:02Z) - Towards Lifelong Learning of Multilingual Text-To-Speech Synthesis [87.75833205560406]
This work presents a lifelong learning approach to train a multilingual Text-To-Speech (TTS) system.
It does not require pooled data from all languages altogether, and thus alleviates the storage and computation burden.
arXiv Detail & Related papers (2021-10-09T07:00:38Z) - Mixed-Lingual Pre-training for Cross-lingual Summarization [54.4823498438831]
Cross-lingual Summarization aims at producing a summary in the target language for an article in the source language.
We propose a solution based on mixed-lingual pre-training that leverages both cross-lingual tasks like translation and monolingual tasks like masked language models.
Our model achieves an improvement of 2.82 (English to Chinese) and 1.15 (Chinese to English) ROUGE-1 scores over state-of-the-art results.
arXiv Detail & Related papers (2020-10-18T00:21:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.