DS@GT at CheckThat! 2025: A Simple Retrieval-First, LLM-Backed Framework for Claim Normalization
- URL: http://arxiv.org/abs/2508.17402v1
- Date: Sun, 24 Aug 2025 15:19:58 GMT
- Title: DS@GT at CheckThat! 2025: A Simple Retrieval-First, LLM-Backed Framework for Claim Normalization
- Authors: Aleksandar Pramov, Jiangqin Ma, Bina Patel,
- Abstract summary: Claim normalization is an integral part of any automatic fact-check verification system.<n>The CheckThat! 2025 Task 2 focuses specifically on claim normalization and spans 20 languages.<n>Our proposed solution consists of a lightweight emphretrieval-first, LLM-backed pipeline.
- Score: 41.99844472131922
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Claim normalization is an integral part of any automatic fact-check verification system. It parses the typically noisy claim data, such as social media posts into normalized claims, which are then fed into downstream veracity classification tasks. The CheckThat! 2025 Task 2 focuses specifically on claim normalization and spans 20 languages under monolingual and zero-shot conditions. Our proposed solution consists of a lightweight \emph{retrieval-first, LLM-backed} pipeline, in which we either dynamically prompt a GPT-4o-mini with in-context examples, or retrieve the closest normalization from the train dataset directly. On the official test set, the system ranks near the top for most monolingual tracks, achieving first place in 7 out of of the 13 languages. In contrast, the system underperforms in the zero-shot setting, highlighting the limitation of the proposed solution.
Related papers
- Do Large Language Models Understand Data Visualization Rules? [2.3332469289621787]
Large language models (LLMs) can generate charts or flag misleading figures, but it remains unclear whether they can reason about and enforce visualization rules directly.<n>We present the first systematic evaluation of LLMs against visualization rules using hard-verification ground truth derived from Answer Set Programming (ASP)<n>Results show that frontier models achieve high adherence (Gemma 3 4B / 27B: 100%, GPT-oss 20B: 98%) and reliably detect common violations (F1 up to 0.82),yet performance drops for subtler perceptual rules (F1 0.15 for some categories) and for outputs generated from technical
arXiv Detail & Related papers (2026-02-23T18:47:51Z) - MultiCW: A Large-Scale Balanced Benchmark Dataset for Training Robust Check-Worthiness Detection Models [6.382707047064603]
Multi-Check-Worthy dataset spans 16 languages, 7 topical domains, and 2 writing styles.<n>It consists of 123,722 samples, evenly distributed between noisy (informal) and structured (formal) texts, with balanced representation of check-worthy and non-check-worthy classes across all languages.
arXiv Detail & Related papers (2026-02-18T09:28:53Z) - AKCIT-FN at CheckThat! 2025: Switching Fine-Tuned SLMs and LLM Prompting for Multilingual Claim Normalization [0.5274891943689054]
Claim normalization is a crucial step in automated fact-checking pipelines.<n>This paper details our submission to the CLEF-2025 CheckThat! Task2, which challenges systems to perform claim normalization across twenty languages.
arXiv Detail & Related papers (2025-09-15T01:19:49Z) - Multilingual vs Crosslingual Retrieval of Fact-Checked Claims: A Tale of Two Approaches [5.850200023135349]
We examine strategies to improve the multilingual and crosslingual performance.<n>We evaluate approaches on a dataset containing posts and claims in 47 languages.<n>Most importantly, we show that crosslinguality is a setup with its own unique characteristics compared to the multilingual setup.
arXiv Detail & Related papers (2025-05-28T08:47:10Z) - TIFIN India at SemEval-2025: Harnessing Translation to Overcome Multilingual IR Challenges in Fact-Checked Claim Retrieval [0.10417205448468168]
We address the challenge of retrieving previously fact-checked claims in monolingual and crosslingual settings.<n>Our approach follows a two-stage strategy: a reliable baseline retrieval system using a fine-tuned embedding model and an LLM-based reranker.
arXiv Detail & Related papers (2025-04-23T11:34:35Z) - Blending LLMs into Cascaded Speech Translation: KIT's Offline Speech Translation System for IWSLT 2024 [61.189875635090225]
Large Language Models (LLMs) are currently under exploration for various tasks, including Automatic Speech Recognition (ASR), Machine Translation (MT), and even End-to-End Speech Translation (ST)
arXiv Detail & Related papers (2024-06-24T16:38:17Z) - HyPoradise: An Open Baseline for Generative Speech Recognition with
Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction.
The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses.
LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z) - A Chat About Boring Problems: Studying GPT-based text normalization [22.64840464909988]
We show the capacity of Large-Language Models for text normalization in few-shot scenarios.
We find LLM based text normalization to achieve error rates around 40% lower than top normalization systems.
We create a new taxonomy of text normalization errors and apply it to results from GPT-3.5-Turbo and GPT-4.0.
arXiv Detail & Related papers (2023-09-23T16:32:59Z) - $k$NN Prompting: Beyond-Context Learning with Calibration-Free Nearest
Neighbor Inference [75.08572535009276]
In-Context Learning (ICL) formulates target tasks as prompt completion conditioned on in-context demonstrations.
$k$NN Prompting first queries LLM with training data for distributed representations, then predicts test instances by simply referring to nearest neighbors.
It significantly outperforms state-of-the-art calibration-based methods under comparable few-shot scenario.
arXiv Detail & Related papers (2023-03-24T06:16:29Z) - HIT-SCIR at MMNLU-22: Consistency Regularization for Multilingual Spoken
Language Understanding [56.756090143062536]
We propose to use consistency regularization based on a hybrid data augmentation strategy.
We conduct experiments on the MASSIVE dataset under both full-dataset and zero-shot settings.
Our proposed method improves the performance on both intent detection and slot filling tasks.
arXiv Detail & Related papers (2023-01-05T11:21:15Z) - Bridging Cross-Lingual Gaps During Leveraging the Multilingual
Sequence-to-Sequence Pretraining for Text Generation [80.16548523140025]
We extend the vanilla pretrain-finetune pipeline with extra code-switching restore task to bridge the gap between the pretrain and finetune stages.
Our approach could narrow the cross-lingual sentence representation distance and improve low-frequency word translation with trivial computational cost.
arXiv Detail & Related papers (2022-04-16T16:08:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.