Related papers: AIRwaves at CheckThat! 2025: Retrieving Scientific Sources for Implicit Claims on Social Media with Dual Encoders and Neural Re-Ranking

AIRwaves at CheckThat! 2025: Retrieving Scientific Sources for Implicit Claims on Social Media with Dual Encoders and Neural Re-Ranking

URL: http://arxiv.org/abs/2509.19509v1
Date: Tue, 23 Sep 2025 19:26:31 GMT
Title: AIRwaves at CheckThat! 2025: Retrieving Scientific Sources for Implicit Claims on Social Media with Dual Encoders and Neural Re-Ranking
Authors: Cem Ashbaugh, Leon Baumgärtner, Tim Gress, Nikita Sidorov, Daniel Werner,
Abstract summary: Team AIRwaves ranked second in Subtask 4b of the CLEF-2025 CheckThat! Lab with an evidence-retrieval approach that markedly outperforms the competition baseline.<n>To surpass this baseline, a two-stage retrieval pipeline is introduced: (i) a first stage that uses a dual encoder based on E5-large, fine-tuned using in-batch and mined hard negatives and enhanced through chunked tokenization and rich document metadata; and (ii) a neural re-ranking stage using a SciBERT cross-encoder.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Linking implicit scientific claims made on social media to their original publications is crucial for evidence-based fact-checking and scholarly discourse, yet it is hindered by lexical sparsity, very short queries, and domain-specific language. Team AIRwaves ranked second in Subtask 4b of the CLEF-2025 CheckThat! Lab with an evidence-retrieval approach that markedly outperforms the competition baseline. The optimized sparse-retrieval baseline(BM25) achieves MRR@5 = 0.5025 on the gold label blind test set. To surpass this baseline, a two-stage retrieval pipeline is introduced: (i) a first stage that uses a dual encoder based on E5-large, fine-tuned using in-batch and mined hard negatives and enhanced through chunked tokenization and rich document metadata; and (ii) a neural re-ranking stage using a SciBERT cross-encoder. Replacing purely lexical matching with neural representations lifts performance to MRR@5 = 0.6174, and the complete pipeline further improves to MRR@5 = 0.6828. The findings demonstrate that coupling dense retrieval with neural re-rankers delivers a powerful and efficient solution for tweet-to-study matching and provides a practical blueprint for future evidence-retrieval pipelines.

Related papers

NERFIFY: A Multi-Agent Framework for Turning NeRF Papers into Code [49.610331036334316]
We introduce NERFIFY, a framework that reliably converts NeRF research papers into trainable Nerfstudio plugins.<n>Code, data and implementations will be publicly released.
arXiv Detail & Related papers (2026-02-28T20:57:32Z)
CVeDRL: An Efficient Code Verifier via Difficulty-aware Reinforcement Learning [57.24524263804788]
Code verifiers play a critical role in post-verification for LLM-based code generation.<n>Existing supervised fine-tuning methods suffer from data scarcity, high failure rates, and poor inference efficiency.<n>We show that naive RL with only functionality rewards fails to generate effective unit tests for difficult branches and samples.
arXiv Detail & Related papers (2026-01-30T10:33:29Z)
RubricHub: A Comprehensive and Highly Discriminative Rubric Dataset via Automated Coarse-to-Fine Generation [11.664443383764448]
Reinforcement Learning with Verifiable Rewards (RLVR) has driven substantial progress in reasoning-intensive domains like mathematics.<n>Existing methods suffer from scalability bottlenecks and coarse criteria, resulting in a supervision ceiling effect.<n>We propose an automated Coarse-to-Fine Generation framework that produces comprehensive and highly discnative criteria.
arXiv Detail & Related papers (2026-01-13T10:56:39Z)
TeaRAG: A Token-Efficient Agentic Retrieval-Augmented Generation Framework [62.66056331998838]
TeaRAG is a token-efficient agentic RAG framework capable of compressing both retrieval content and reasoning steps.<n>Our reward function evaluates the knowledge sufficiency by a knowledge matching mechanism, while penalizing excessive reasoning steps.
arXiv Detail & Related papers (2025-11-07T16:08:34Z)
ModernBERT + ColBERT: Enhancing biomedical RAG through an advanced re-ranking retriever [0.5371337604556311]
We develop a lightweight ModernBERT bidirectional encoder for efficient initial candidate retrieval with a ColBERTv2 late-interaction model for fine-grained re-ranking.<n>Our analysis of the retriever module confirmed the positive impact of the ColBERT re-ranker, which improved Recall@3 by up to 4.2 percentage points.<n>Our ablation studies reveal that this performance is critically dependent on a joint fine-tuning process that aligns the retriever and re-ranker.
arXiv Detail & Related papers (2025-10-06T12:34:55Z)
When Retriever Meets Generator: A Joint Model for Code Comment Generation [3.6781644685120924]
RAGSum is built on top offuse retrieval and generation using a single CodeT5 backbone.<n>A contrastive pre-training phase shapes code embeddings for nearest-neighbor search.<n>A lightweight self-refinement loop is deployed to polish the final output.
arXiv Detail & Related papers (2025-07-16T18:12:27Z)
Deep Retrieval at CheckThat! 2025: Identifying Scientific Papers from Implicit Social Media Mentions via Hybrid Retrieval and Re-Ranking [4.275139302875217]
We present the methodology and results of the Deep Retrieval team for subtask 4b of the CLEF CheckThat! 2025 competition.<n>We propose a hybrid retrieval pipeline that combines lexical precision, semantic generalization, and deep contextual re-ranking.<n>Our approach achieves a mean reciprocal rank at 5 (MRR@5) of 76.46% on the development set and 66.43% on the hidden test set.
arXiv Detail & Related papers (2025-05-29T08:55:39Z)
Scalable Unit Harmonization in Medical Informatics via Bayesian-Optimized Retrieval and Transformer-Based Re-ranking [0.0]
We develop a scalable methodology for harmonizing inconsistent units in large-scale clinical datasets.<n>We implement a multi-stage pipeline: filtering, identification, harmonization proposal generation, automated re-ranking, and manual validation.<n>The system achieved 83.39% precision at rank 1 and 94.66% recall at rank 5.
arXiv Detail & Related papers (2025-05-01T19:09:15Z)
Can Query Expansion Improve Generalization of Strong Cross-Encoder Rankers? [72.42500059688396]
We show that it is possible to improve the generalization of a strong neural ranker, by prompt engineering and aggregating the ranking results of each expanded query via fusion. Experiments on BEIR and TREC Deep Learning show that the nDCG@10 scores of both MonoT5 and RankT5 following these steps are improved.
arXiv Detail & Related papers (2023-11-15T18:11:41Z)
HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction. The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses. LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z)
Noise-Robust Dense Retrieval via Contrastive Alignment Post Training [89.29256833403167]
Contrastive Alignment POst Training (CAPOT) is a highly efficient finetuning method that improves model robustness without requiring index regeneration. CAPOT enables robust retrieval by freezing the document encoder while the query encoder learns to align noisy queries with their unaltered root. We evaluate CAPOT noisy variants of MSMARCO, Natural Questions, and Trivia QA passage retrieval, finding CAPOT has a similar impact as data augmentation with none of its overhead.
arXiv Detail & Related papers (2023-04-06T22:16:53Z)
KG-FiD: Infusing Knowledge Graph in Fusion-in-Decoder for Open-Domain Question Answering [68.00631278030627]
We propose a novel method KG-FiD, which filters noisy passages by leveraging the structural relationship among the retrieved passages with a knowledge graph. We show that KG-FiD can improve vanilla FiD by up to 1.5% on answer exact match score and achieve comparable performance with FiD with only 40% of computation cost.
arXiv Detail & Related papers (2021-10-08T18:39:59Z)
Disentangle Your Dense Object Detector [82.22771433419727]
Deep learning-based dense object detectors have achieved great success in the past few years and have been applied to numerous multimedia applications such as video understanding. However, the current training pipeline for dense detectors is compromised to lots of conjunctions that may not hold. We propose Disentangled Dense Object Detector (DDOD), in which simple and effective disentanglement mechanisms are designed and integrated into the current state-of-the-art detectors.
arXiv Detail & Related papers (2021-07-07T00:52:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.