A Comprehensive Evaluation of Transformer-Based Question Answering Models and RAG-Enhanced Design
- URL: http://arxiv.org/abs/2509.21845v1
- Date: Fri, 26 Sep 2025 04:11:10 GMT
- Title: A Comprehensive Evaluation of Transformer-Based Question Answering Models and RAG-Enhanced Design
- Authors: Zichen Zhang, Kunlong Zhang, Hongwei Ruan, Yiming Luo,
- Abstract summary: Transformer-based models have advanced the field of question answering, but multi-hop reasoning, where answers require combining evidence across multiple passages, remains difficult.<n>This paper presents a comprehensive evaluation of retrieval strategies for multi-hop question answering within a retrieval-augmented generation framework.<n> Experiments on the HotpotQA dataset show that the hybrid approach substantially outperforms baseline methods.
- Score: 3.974953368025457
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transformer-based models have advanced the field of question answering, but multi-hop reasoning, where answers require combining evidence across multiple passages, remains difficult. This paper presents a comprehensive evaluation of retrieval strategies for multi-hop question answering within a retrieval-augmented generation framework. We compare cosine similarity, maximal marginal relevance, and a hybrid method that integrates dense embeddings with lexical overlap and re-ranking. To further improve retrieval, we adapt the EfficientRAG pipeline for query optimization, introducing token labeling and iterative refinement while maintaining efficiency. Experiments on the HotpotQA dataset show that the hybrid approach substantially outperforms baseline methods, achieving a relative improvement of 50 percent in exact match and 47 percent in F1 score compared to cosine similarity. Error analysis reveals that hybrid retrieval improves entity recall and evidence complementarity, while remaining limited in handling distractors and temporal reasoning. Overall, the results suggest that hybrid retrieval-augmented generation provides a practical zero-shot solution for multi-hop question answering, balancing accuracy, efficiency, and interpretability.
Related papers
- Hybrid Retrieval-Augmented Generation for Robust Multilingual Document Question Answering [0.3376269351435395]
Large-scale digitization initiatives have unlocked massive collections of historical newspapers.<n>We develop and evaluate a multilingual Retrieval-Augmented Generation pipeline specifically designed for question answering on noisy historical documents.
arXiv Detail & Related papers (2025-12-14T13:57:05Z) - RouteRAG: Efficient Retrieval-Augmented Generation from Text and Graph via Reinforcement Learning [69.87510139069218]
Retrieval-Augmented Generation (RAG) integrates non-parametric knowledge into Large Language Models (LLMs)<n>Recent progress has advanced text-based RAG to multi-turn reasoning through Reinforcement Learning (RL)<n>We introduce model, an RL-based framework that enables LLMs to perform multi-turn and adaptive graph-text hybrid RAG.
arXiv Detail & Related papers (2025-12-10T10:05:31Z) - DAT: Dynamic Alpha Tuning for Hybrid Retrieval in Retrieval-Augmented Generation [0.0]
DAT (Dynamic Alpha Tuning) is a novel hybrid retrieval framework that balances dense retrieval and BM25 for each query.<n>It consistently outperforms fixed-weighting hybrid retrieval methods across various evaluation metrics.<n>Even on smaller models, DAT delivers strong performance, highlighting its efficiency and adaptability.
arXiv Detail & Related papers (2025-03-29T08:35:01Z) - GRITHopper: Decomposition-Free Multi-Hop Dense Retrieval [52.47514434103737]
We introduce GRITHopper-7B, a novel multi-hop dense retrieval model that achieves state-of-the-art performance.<n> GRITHopper combines generative and representational instruction tuning by integrating causal language modeling with dense retrieval training.<n>We find that incorporating additional context after the retrieval process, referred to as post-retrieval language modeling, enhances dense retrieval performance.
arXiv Detail & Related papers (2025-03-10T16:42:48Z) - From Retrieval to Generation: Comparing Different Approaches [15.31883349259767]
We evaluate retrieval-based, generation-based, and hybrid models for knowledge-intensive tasks.<n>We show that dense retrievers, particularly DPR, achieve strong performance in ODQA with a top-1 accuracy of 50.17% on NQ.<n>We also analyze language modeling tasks using WikiText-103, showing that retrieval-based approaches like BM25 achieve lower perplexity compared to generative and hybrid methods.
arXiv Detail & Related papers (2025-02-27T16:29:14Z) - LevelRAG: Enhancing Retrieval-Augmented Generation with Multi-hop Logic Planning over Rewriting Augmented Searchers [24.01783076521377]
Retrieval-Augmented Generation (RAG) is a crucial method for mitigating hallucinations in Large Language Models (LLMs)<n>Existing RAG methods typically employ query rewriting to clarify the user intent and manage multi-hop logic, while using hybrid retrieval to expand search scope.<n>We introduce a high-level searcher that decomposes complex queries into atomic queries, independent of any retriever-specific optimizations.<n>To harness the strengths of sparse retrievers for precise keyword retrieval, we have developed a new sparse searcher that employs Lucene syntax to enhance retrieval accuracy.
arXiv Detail & Related papers (2025-02-25T12:09:16Z) - Chain-of-Retrieval Augmented Generation [72.06205327186069]
This paper introduces an approach for training o1-like RAG models that retrieve and reason over relevant information step by step before generating the final answer.<n>Our proposed method, CoRAG, allows the model to dynamically reformulate the query based on the evolving state.
arXiv Detail & Related papers (2025-01-24T09:12:52Z) - SiReRAG: Indexing Similar and Related Information for Multihop Reasoning [96.60045548116584]
SiReRAG is a novel RAG indexing approach that explicitly considers both similar and related information.<n>SiReRAG consistently outperforms state-of-the-art indexing methods on three multihop datasets.
arXiv Detail & Related papers (2024-12-09T04:56:43Z) - Domain-specific Question Answering with Hybrid Search [39.85176264551715]
We show that a hybrid approach combining a fine-tuned dense retriever with keyword based sparse search methods significantly enhances performance.<n> Experimental results indicate that this hybrid method outperforms our single-retriever system.
arXiv Detail & Related papers (2024-12-04T22:04:13Z) - Hybrid and Collaborative Passage Reranking [144.83902343298112]
We propose a Hybrid and Collaborative Passage Reranking (HybRank) method.
It incorporates the lexical and semantic properties of sparse and dense retrievers for reranking.
Built on off-the-shelf retriever features, HybRank is a plug-in reranker capable of enhancing arbitrary passage lists.
arXiv Detail & Related papers (2023-05-16T09:38:52Z) - ECO-TR: Efficient Correspondences Finding Via Coarse-to-Fine Refinement [80.94378602238432]
We propose an efficient structure named Correspondence Efficient Transformer (ECO-TR) by finding correspondences in a coarse-to-fine manner.
To achieve this, multiple transformer blocks are stage-wisely connected to gradually refine the predicted coordinates.
Experiments on various sparse and dense matching tasks demonstrate the superiority of our method in both efficiency and effectiveness against existing state-of-the-arts.
arXiv Detail & Related papers (2022-09-25T13:05:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.