Never Come Up Empty: Adaptive HyDE Retrieval for Improving LLM Developer Support
- URL: http://arxiv.org/abs/2507.16754v1
- Date: Tue, 22 Jul 2025 16:46:00 GMT
- Title: Never Come Up Empty: Adaptive HyDE Retrieval for Improving LLM Developer Support
- Authors: Fangjian Lei, Mariam El Mezouar, Shayan Noei, Ying Zou,
- Abstract summary: We construct a retrieval corpus of over 3 million Java and Python related Stack Overflow posts with accepted answers.<n>We evaluate various RAG pipeline designs to answer developer questions, evaluating their effectiveness in generating accurate and reliable responses.
- Score: 0.5356944479760103
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) have shown promise in assisting developers with code-related questions; however, LLMs carry the risk of generating unreliable answers. To address this, Retrieval-Augmented Generation (RAG) has been proposed to reduce the unreliability (i.e., hallucinations) of LLMs. However, designing effective pipelines remains challenging due to numerous design choices. In this paper, we construct a retrieval corpus of over 3 million Java and Python related Stack Overflow posts with accepted answers, and explore various RAG pipeline designs to answer developer questions, evaluating their effectiveness in generating accurate and reliable responses. More specifically, we (1) design and evaluate 7 different RAG pipelines and 63 pipeline variants to answer questions that have historically similar matches, and (2) address new questions without any close prior matches by automatically lowering the similarity threshold during retrieval, thereby increasing the chance of finding partially relevant context and improving coverage for unseen cases. We find that implementing a RAG pipeline combining hypothetical-documentation-embedding (HyDE) with the full-answer context performs best in retrieving and answering similarcontent for Stack Overflow questions. Finally, we apply our optimal RAG pipeline to 4 open-source LLMs and compare the results to their zero-shot performance. Our findings show that RAG with our optimal RAG pipeline consistently outperforms zero-shot baselines across models, achieving higher scores for helpfulness, correctness, and detail with LLM-as-a-judge. These findings demonstrate that our optimal RAG pipelines robustly enhance answer quality for a wide range of developer queries including both previously seen and novel questions across different LLMs
Related papers
- DeepSieve: Information Sieving via LLM-as-a-Knowledge-Router [57.28685457991806]
DeepSieve is an agentic RAG framework that incorporates information sieving via LLM-as-a-knowledge-router.<n>Our design emphasizes modularity, transparency, and adaptability, leveraging recent advances in agentic system design.
arXiv Detail & Related papers (2025-07-29T17:55:23Z) - Question Decomposition for Retrieval-Augmented Generation [2.6409776648054764]
We propose a RAG pipeline that incorporates question decomposition into sub-questions.<n>We show that question decomposition effectively assembles complementary documents, while reranking reduces noise.<n>Although reranking itself is standard, we show that pairing an off-the-shelf cross-encoder reranker with LLM-driven question decomposition bridges the retrieval gap on multi-hop questions.
arXiv Detail & Related papers (2025-07-01T01:01:54Z) - LLM-Assisted Question-Answering on Technical Documents Using Structured Data-Aware Retrieval Augmented Generation [0.432776344138537]
Large Language Models (LLMs) are capable of natural language understanding and generation.<n> Fine-tuning is one possible solution, but it is resource-intensive and must be repeated with every data update.<n>Retrieval-Augmented Generation (RAG) offers an efficient solution by allowing LLMs to access external knowledge sources.
arXiv Detail & Related papers (2025-06-29T08:22:03Z) - LaRA: Benchmarking Retrieval-Augmented Generation and Long-Context LLMs -- No Silver Bullet for LC or RAG Routing [70.35888047551643]
We present LaRA, a novel benchmark specifically designed to rigorously compare RAG and LC LLMs.<n>LaRA encompasses 2326 test cases across four practical QA task categories and three types of naturally occurring long texts.<n>We find that the optimal choice between RAG and LC depends on a complex interplay of factors, including the model's parameter size, long-text capabilities, context length, task type, and the characteristics of the retrieved chunks.
arXiv Detail & Related papers (2025-02-14T08:04:22Z) - Toward Optimal Search and Retrieval for RAG [39.69494982983534]
Retrieval-augmented generation (RAG) is a promising method for addressing some of the memory-related challenges associated with Large Language Models (LLMs)
Here, we work towards the goal of understanding how retrievers can be optimized for RAG pipelines for common tasks such as Question Answering (QA)
arXiv Detail & Related papers (2024-11-11T22:06:51Z) - EfficientRAG: Efficient Retriever for Multi-Hop Question Answering [52.64500643247252]
We introduce EfficientRAG, an efficient retriever for multi-hop question answering.
Experimental results demonstrate that EfficientRAG surpasses existing RAG methods on three open-domain multi-hop question-answering datasets.
arXiv Detail & Related papers (2024-08-08T06:57:49Z) - RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework [66.93260816493553]
This paper introduces RAGEval, a framework designed to assess RAG systems across diverse scenarios.<n>With a focus on factual accuracy, we propose three novel metrics: Completeness, Hallucination, and Irrelevance.<n> Experimental results show that RAGEval outperforms zero-shot and one-shot methods in terms of clarity, safety, conformity, and richness of generated samples.
arXiv Detail & Related papers (2024-08-02T13:35:11Z) - RAG-QA Arena: Evaluating Domain Robustness for Long-form Retrieval Augmented Question Answering [61.19126689470398]
Long-form RobustQA (LFRQA) is a new dataset covering 26K queries and large corpora across seven different domains.
We show via experiments that RAG-QA Arena and human judgments on answer quality are highly correlated.
Only 41.3% of the most competitive LLM's answers are preferred to LFRQA's answers, demonstrating RAG-QA Arena as a challenging evaluation platform for future research.
arXiv Detail & Related papers (2024-07-19T03:02:51Z) - BERGEN: A Benchmarking Library for Retrieval-Augmented Generation [26.158785168036662]
Retrieval-Augmented Generation allows to enhance Large Language Models with external knowledge.
Inconsistent benchmarking poses a major challenge in comparing approaches and understanding the impact of each component in the pipeline.
In this work, we study best practices that lay the groundwork for a systematic evaluation of RAG and present BERGEN, an end-to-end library for reproducible research standardizing RAG experiments.
arXiv Detail & Related papers (2024-07-01T09:09:27Z) - Multi-Head RAG: Solving Multi-Aspect Problems with LLMs [18.581518952488093]
Multi-Head RAG (MRAG) is a novel scheme for fetching multi-aspect documents.<n>We show MRAG's design advantages over 18 RAG baselines, empirical improvements of up to 20% in retrieval success ratios.
arXiv Detail & Related papers (2024-06-07T16:59:38Z) - SuRe: Summarizing Retrievals using Answer Candidates for Open-domain QA of LLMs [85.54906813106683]
We propose a simple yet effective framework to enhance open-domain question answering (ODQA) with large language models (LLMs)
SuRe helps LLMs predict more accurate answers for a given question, which are well-supported by the summarized retrieval (SuRe)
Experimental results on diverse ODQA benchmarks demonstrate the superiority of SuRe, with improvements of up to 4.6% in exact match (EM) and 4.0% in F1 score over standard prompting approaches.
arXiv Detail & Related papers (2024-04-17T01:15:54Z) - Allies: Prompting Large Language Model with Beam Search [107.38790111856761]
In this work, we propose a novel method called ALLIES.
Given an input query, ALLIES leverages LLMs to iteratively generate new queries related to the original query.
By iteratively refining and expanding the scope of the original query, ALLIES captures and utilizes hidden knowledge that may not be directly through retrieval.
arXiv Detail & Related papers (2023-05-24T06:16:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.