Transformer Tafsir at QIAS 2025 Shared Task: Hybrid Retrieval-Augmented Generation for Islamic Knowledge Question Answering
- URL: http://arxiv.org/abs/2509.23793v1
- Date: Sun, 28 Sep 2025 10:27:08 GMT
- Title: Transformer Tafsir at QIAS 2025 Shared Task: Hybrid Retrieval-Augmented Generation for Islamic Knowledge Question Answering
- Authors: Muhammad Abu Ahmad, Mohamad Ballout, Raia Abu Ahmad, Elia Bruni,
- Abstract summary: This paper presents our submission to the QIAS 2025 shared task on Islamic knowledge understanding and reasoning.<n>We developed a hybrid retrieval-augmented generation (RAG) system that combines sparse and dense retrieval methods with cross-encoder reranking to improve large language model (LLM) performance.<n>Our three-stage pipeline incorporates BM25 for initial retrieval, a dense embedding retrieval model for semantic matching, and cross-encoder reranking for precise content retrieval.
- Score: 4.3834796470965385
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents our submission to the QIAS 2025 shared task on Islamic knowledge understanding and reasoning. We developed a hybrid retrieval-augmented generation (RAG) system that combines sparse and dense retrieval methods with cross-encoder reranking to improve large language model (LLM) performance. Our three-stage pipeline incorporates BM25 for initial retrieval, a dense embedding retrieval model for semantic matching, and cross-encoder reranking for precise content retrieval. We evaluate our approach on both subtasks using two LLMs, Fanar and Mistral, demonstrating that the proposed RAG pipeline enhances performance across both, with accuracy improvements up to 25%, depending on the task and model configuration. Our best configuration is achieved with Fanar, yielding accuracy scores of 45% in Subtask 1 and 80% in Subtask 2.
Related papers
- MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization [103.74675519953898]
Long-chain reflective reasoning is a prerequisite for solving complex real-world problems.<n>We build a benchmark consisting 1,260 samples of 42 challenging synthetic tasks.<n>We generate post-training data and explore learning paradigms for exploiting such data.
arXiv Detail & Related papers (2025-10-09T17:53:58Z) - ImpRAG: Retrieval-Augmented Generation with Implicit Queries [34.72864597562907]
ImpRAG is a query-free RAG system that integrates retrieval and generation into a unified model.<n>We show that ImpRAG achieves 3.6-11.5 improvements in exact match scores on unseen tasks with diverse formats.
arXiv Detail & Related papers (2025-06-02T21:38:21Z) - Deep Retrieval at CheckThat! 2025: Identifying Scientific Papers from Implicit Social Media Mentions via Hybrid Retrieval and Re-Ranking [4.275139302875217]
We present the methodology and results of the Deep Retrieval team for subtask 4b of the CLEF CheckThat! 2025 competition.<n>We propose a hybrid retrieval pipeline that combines lexical precision, semantic generalization, and deep contextual re-ranking.<n>Our approach achieves a mean reciprocal rank at 5 (MRR@5) of 76.46% on the development set and 66.43% on the hidden test set.
arXiv Detail & Related papers (2025-05-29T08:55:39Z) - Reinforced Model Merging [53.84354455400038]
We present an innovative framework termed Reinforced Model Merging (RMM), which encompasses an environment and agent tailored for merging tasks.<n>By utilizing data subsets during the evaluation process, we addressed the bottleneck in the reward feedback phase, thereby accelerating RMM by up to 100 times.
arXiv Detail & Related papers (2025-03-27T08:52:41Z) - From Retrieval to Generation: Comparing Different Approaches [15.31883349259767]
We evaluate retrieval-based, generation-based, and hybrid models for knowledge-intensive tasks.<n>We show that dense retrievers, particularly DPR, achieve strong performance in ODQA with a top-1 accuracy of 50.17% on NQ.<n>We also analyze language modeling tasks using WikiText-103, showing that retrieval-based approaches like BM25 achieve lower perplexity compared to generative and hybrid methods.
arXiv Detail & Related papers (2025-02-27T16:29:14Z) - Telco-DPR: A Hybrid Dataset for Evaluating Retrieval Models of 3GPP Technical Specifications [0.8999666725996975]
This paper proposes a Question-Answering (QA) system for the telecom domain using 3rd Generation Partnership Project technical documents.
A hybrid dataset, Telco-DPR, is presented, combining text and tables, and includes a set of synthetic question/answer pairs.
The retrieval models are evaluated and compared using top-K accuracy and Mean Reciprocal Rank (MRR)
The proposed QA system, using the developed RAG model and the Generative Pretrained Transformer (GPT)-4, achieves a 14% improvement in answer accuracy.
arXiv Detail & Related papers (2024-10-15T16:37:18Z) - List-aware Reranking-Truncation Joint Model for Search and
Retrieval-augmented Generation [80.12531449946655]
We propose a Reranking-Truncation joint model (GenRT) that can perform the two tasks concurrently.
GenRT integrates reranking and truncation via generative paradigm based on encoder-decoder architecture.
Our method achieves SOTA performance on both reranking and truncation tasks for web search and retrieval-augmented LLMs.
arXiv Detail & Related papers (2024-02-05T06:52:53Z) - TCE at Qur'an QA 2023 Shared Task: Low Resource Enhanced
Transformer-based Ensemble Approach for Qur'anic QA [0.0]
We present our approach to tackle Qur'an QA 2023 shared tasks A and B.
To address the challenge of low-resourced training data, we rely on transfer learning together with a voting ensemble.
We employ different architectures and learning mechanisms for a range of Arabic pre-trained transformer-based models for both tasks.
arXiv Detail & Related papers (2024-01-23T19:32:54Z) - Sequencing Matters: A Generate-Retrieve-Generate Model for Building
Conversational Agents [9.191944519634111]
The Georgetown InfoSense group has done in regard to solving the challenges presented by TREC iKAT 2023.
Our submitted runs outperform the median runs by a significant margin, exhibiting superior performance in nDCG across various cut numbers and in overall success rate.
Our solution involves the use of Large Language Models (LLMs) for initial answers, answer grounding by BM25, passage quality filtering by logistic regression, and answer generation by LLMs again.
arXiv Detail & Related papers (2023-11-16T02:37:58Z) - An Empirical Study of Multimodal Model Merging [148.48412442848795]
Model merging is a technique that fuses multiple models trained on different tasks to generate a multi-task solution.
We conduct our study for a novel goal where we can merge vision, language, and cross-modal transformers of a modality-specific architecture.
We propose two metrics that assess the distance between weights to be merged and can serve as an indicator of the merging outcomes.
arXiv Detail & Related papers (2023-04-28T15:43:21Z) - Pre-training Tasks for Embedding-based Large-scale Retrieval [68.01167604281578]
We consider the large-scale query-document retrieval problem.
Given a query (e.g., a question), return the set of relevant documents from a large document corpus.
We show that the key ingredient of learning a strong embedding-based Transformer model is the set of pre-training tasks.
arXiv Detail & Related papers (2020-02-10T16:44:00Z) - Meta Reinforcement Learning with Autonomous Inference of Subtask
Dependencies [57.27944046925876]
We propose and address a novel few-shot RL problem, where a task is characterized by a subtask graph.
Instead of directly learning a meta-policy, we develop a Meta-learner with Subtask Graph Inference.
Our experiment results on two grid-world domains and StarCraft II environments show that the proposed method is able to accurately infer the latent task parameter.
arXiv Detail & Related papers (2020-01-01T17:34:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.