Exploring Demonstration Retrievers in RAG for Coding Tasks: Yeas and Nays!
- URL: http://arxiv.org/abs/2410.09662v1
- Date: Sat, 12 Oct 2024 22:31:01 GMT
- Title: Exploring Demonstration Retrievers in RAG for Coding Tasks: Yeas and Nays!
- Authors: Pengfei He, Shaowei Wang, Shaiful Chowdhury, Tse-Hsun Chen,
- Abstract summary: This paper systematically evaluates the efficiency-effectiveness trade-off of retrievers across three coding tasks.
We show that while BM25 excels in effectiveness, it suffers in efficiency as the knowledge base grows beyond 1000 entries.
In large-scale retrieval, efficiency differences become more pronounced, with approximate dense retrievers offering the greatest gains.
- Score: 6.34946724864899
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by integrating external knowledge bases, achieving state-of-the-art results in various coding tasks. The core of RAG is retrieving demonstration examples, which is essential to balance effectiveness (generation quality) and efficiency (retrieval time) for optimal performance. However, the high-dimensional nature of code representations and large knowledge bases often create efficiency bottlenecks, which are overlooked in previous research. This paper systematically evaluates the efficiency-effectiveness trade-off of retrievers across three coding tasks: Program Synthesis, Commit Message Generation, and Assertion Generation. We examined six retrievers: two sparse (BM25 and BM25L) and four dense retrievers, including one exhaustive dense retriever (SBERT's Semantic Search) and three approximate dense retrievers (ANNOY, LSH, and HNSW). Our findings show that while BM25 excels in effectiveness, it suffers in efficiency as the knowledge base grows beyond 1000 entries. In large-scale retrieval, efficiency differences become more pronounced, with approximate dense retrievers offering the greatest gains. For instance, in Commit Generation task, HNSW achieves a 44x speed up, while only with a 1.74% drop in RougeL compared with BM25. Our results also show that increasing the number of demonstrations in the prompt doesn't always improve the effectiveness and can increase latency and lead to incorrect outputs. Our findings provide valuable insights for practitioners aiming to build efficient and effective RAG systems for coding tasks.
Related papers
- CodeXEmbed: A Generalist Embedding Model Family for Multiligual and Multi-task Code Retrieval [103.116634967815]
We introduce CodeXEmbed, a family of large-scale code embedding models ranging from 400M to 7B parameters.
Our novel training pipeline unifies multiple programming languages and transforms various code-related tasks into a common retrieval framework.
Our 7B model sets a new state-of-the-art (SOTA) in code retrieval, outperforming the previous leading model, Voyage-Code, by over 20% on CoIR benchmark.
arXiv Detail & Related papers (2024-11-19T16:54:45Z) - Toward Optimal Search and Retrieval for RAG [39.69494982983534]
Retrieval-augmented generation (RAG) is a promising method for addressing some of the memory-related challenges associated with Large Language Models (LLMs)
Here, we work towards the goal of understanding how retrievers can be optimized for RAG pipelines for common tasks such as Question Answering (QA)
arXiv Detail & Related papers (2024-11-11T22:06:51Z) - Towards Competitive Search Relevance For Inference-Free Learned Sparse Retrievers [6.773411876899064]
inference-free sparse models lag far behind in terms of search relevance when compared to both sparse and dense siamese models.
We propose two different approaches for performance improvement. First, we introduce the IDF-aware FLOPS loss, which introduces Inverted Document Frequency (IDF) to the sparsification of representations.
We find that it mitigates the negative impact of the FLOPS regularization on search relevance, allowing the model to achieve a better balance between accuracy and efficiency.
arXiv Detail & Related papers (2024-11-07T03:46:43Z) - Instructive Code Retriever: Learn from Large Language Model's Feedback for Code Intelligence Tasks [10.867880635762395]
We introduce a novel approach named Instructive Code Retriever (ICR)
ICR is designed to retrieve examples that enhance model inference across various code intelligence tasks and datasets.
We evaluate our model's effectiveness on various tasks, i.e., code summarization, program synthesis, and bug fixing.
arXiv Detail & Related papers (2024-10-15T05:44:00Z) - FunnelRAG: A Coarse-to-Fine Progressive Retrieval Paradigm for RAG [22.4664221738095]
Retrieval-Augmented Generation (RAG) prevails in Large Language Models.
We propose a progressive retrieval paradigm with coarse-to-fine granularity for RAG.
arXiv Detail & Related papers (2024-10-14T08:47:21Z) - Accelerating LLaMA Inference by Enabling Intermediate Layer Decoding via
Instruction Tuning with LITE [62.13435256279566]
Large Language Models (LLMs) have achieved remarkable performance across a wide variety of natural language tasks.
However, their large size makes their inference slow and computationally expensive.
We show that it enables these layers to acquire 'good' generation ability without affecting the generation ability of the final layer.
arXiv Detail & Related papers (2023-10-28T04:07:58Z) - MASTER: Multi-task Pre-trained Bottlenecked Masked Autoencoders are
Better Dense Retrievers [140.0479479231558]
In this work, we aim to unify a variety of pre-training tasks into a multi-task pre-trained model, namely MASTER.
MASTER utilizes a shared-encoder multi-decoder architecture that can construct a representation bottleneck to compress the abundant semantic information across tasks into dense vectors.
arXiv Detail & Related papers (2022-12-15T13:57:07Z) - Training Data is More Valuable than You Think: A Simple and Effective
Method by Retrieving from Training Data [82.92758444543689]
Retrieval-based methods have been shown to be effective in NLP tasks via introducing external knowledge.
Surprisingly, we found that REtrieving from the traINing datA (REINA) only can lead to significant gains on multiple NLG and NLU tasks.
Experimental results show that this simple method can achieve significantly better performance on a variety of NLU and NLG tasks.
arXiv Detail & Related papers (2022-03-16T17:37:27Z) - LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text
Retrieval [55.097573036580066]
Experimental results show that LaPraDoR achieves state-of-the-art performance compared with supervised dense retrieval models.
Compared to re-ranking, our lexicon-enhanced approach can be run in milliseconds (22.5x faster) while achieving superior performance.
arXiv Detail & Related papers (2022-03-11T18:53:12Z) - Adversarial Retriever-Ranker for dense text retrieval [51.87158529880056]
We present Adversarial Retriever-Ranker (AR2), which consists of a dual-encoder retriever plus a cross-encoder ranker.
AR2 consistently and significantly outperforms existing dense retriever methods.
This includes the improvements on Natural Questions R@5 to 77.9%(+2.1%), TriviaQA R@5 to 78.2%(+1.4), and MS-MARCO MRR@10 to 39.5%(+1.3%)
arXiv Detail & Related papers (2021-10-07T16:41:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.