Related papers: Improving Retrieval-Augmented Deep Assertion Generation via Joint Training

Improving Retrieval-Augmented Deep Assertion Generation via Joint Training

URL: http://arxiv.org/abs/2502.10696v2
Date: Mon, 24 Feb 2025 09:00:10 GMT
Title: Improving Retrieval-Augmented Deep Assertion Generation via Joint Training
Authors: Quanjun Zhang, Chunrong Fang, Yi Zheng, Ruixiang Qian, Shengcheng Yu, Yuan Zhao, Jianyi Zhou, Yun Yang, Tao Zheng, Zhenyu Chen,
Abstract summary: We propose AG-RAG, a retrieval-augmented automated assertion generation approach.<n>AG-RAG builds a dense retriever to search for relevant test-assert pairs (TAPs) with semantic matching.<n>We extensively evaluate AG-RAG against six state-of-the-art AG approaches on two benchmarks and three metrics.
Score: 21.2001651233287
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Unit testing attempts to validate the correctness of basic units of the software system under test and has a crucial role in software development and testing. Very recent work proposes a retrieve-and-edit approach to generate unit test oracles, i.e., assertions. Despite being promising, it is still far from perfect due to some limitations, such as splitting assertion retrieval and generation into two separate components without benefiting each other. In this paper, we propose AG-RAG, a retrieval-augmented automated assertion generation approach that leverages external codebases and joint training to address various technical limitations of prior work. Inspired by the plastic surgery hypothesis, AG-RAG attempts to combine relevant unit tests and advanced pre-trained language models (PLMs) with retrieval-augmented fine-tuning. AG-RAG builds a dense retriever to search for relevant test-assert pairs (TAPs) with semantic matching and a retrieval-augmented generator to synthesize accurate assertions with the focal-test and retrieved TAPs as input. Besides, AG-RAG leverages a code-aware language model CodeT5 as the cornerstone to facilitate both assertion retrieval and generation tasks. Furthermore, the retriever is optimized in conjunction with the generator as a whole pipeline with a joint training strategy. This unified design fully adapts both components specifically for retrieving more useful TAPs, thereby generating accurate assertions. We extensively evaluate AG-RAG against six state-of-the-art AG approaches on two benchmarks and three metrics. Experimental results show that AG-RAG significantly outperforms previous AG approaches on all benchmarks and metrics, e.g., improving the most recent baseline EditAS by 20.82% and 26.98% in terms of accuracy. AG-RAG also correctly generates 1739 and 2866 unique assertions that all baselines fail to generate, 3.45X and 9.20X more than EditAS.

Related papers

A-RAG: Scaling Agentic Retrieval-Augmented Generation via Hierarchical Retrieval Interfaces [34.59674580962045]
We introduce A-RAG, an Agentic RAG framework that exposes hierarchical retrieval interfaces directly to the model.<n>A-RAG provides three retrieval tools: keyword search, semantic search, and chunk read, enabling the agent to adaptively search and retrieve information across multiple granularities.<n> Experiments on multiple open-domain QA benchmarks show that A-RAG consistently outperforms existing approaches with comparable or lower retrieved tokens.
arXiv Detail & Related papers (2026-02-03T12:07:21Z)
TeaRAG: A Token-Efficient Agentic Retrieval-Augmented Generation Framework [62.66056331998838]
TeaRAG is a token-efficient agentic RAG framework capable of compressing both retrieval content and reasoning steps.<n>Our reward function evaluates the knowledge sufficiency by a knowledge matching mechanism, while penalizing excessive reasoning steps.
arXiv Detail & Related papers (2025-11-07T16:08:34Z)
Alita-G: Self-Evolving Generative Agent for Agent Generation [54.49365835457433]
We present ALITA-G, a framework that transforms a general-purpose agent into a domain expert.<n>In this framework, a generalist agent executes a curated suite of target-domain tasks.<n>It attains strong gains while reducing computation costs.
arXiv Detail & Related papers (2025-10-27T17:59:14Z)
Do Large Language Models Respect Contracts? Evaluating and Enforcing Contract-Adherence in Code Generation [11.445615378917578]
PACT is a program assessment and contract-adherence evaluation framework.<n>It provides a comprehensive test-suite corpus focused on contract violations.<n>It enables a systematic analysis of code generation under varied prompting conditions.
arXiv Detail & Related papers (2025-10-14T01:12:37Z)
Eigen-1: Adaptive Multi-Agent Refinement with Monitor-Based RAG for Scientific Reasoning [53.45095336430027]
We develop a unified framework that combines implicit retrieval and structured collaboration.<n>On Humanity's Last Exam (HLE) Bio/Chem Gold, our framework achieves 48.3% accuracy.<n>Results on SuperGPQA and TRQA confirm robustness across domains.
arXiv Detail & Related papers (2025-09-25T14:05:55Z)
BifrostRAG: Bridging Dual Knowledge Graphs for Multi-Hop Question Answering in Construction Safety [11.079426930790458]
Many compliance-related queries are multi-hop, requiring synthesis of information across interlinked clauses.<n>This poses a challenge for traditional retrieval-augmented generation (RAG) systems.<n>We introduce BifrostRAG: a dual-graph RAG-integrated system that explicitly models both linguistic relationships and document structure.
arXiv Detail & Related papers (2025-07-18T03:39:14Z)
ImpRAG: Retrieval-Augmented Generation with Implicit Queries [49.510101132093396]
ImpRAG is a query-free RAG system that integrates retrieval and generation into a unified model.<n>We show that ImpRAG achieves 3.6-11.5 improvements in exact match scores on unseen tasks with diverse formats.
arXiv Detail & Related papers (2025-06-02T21:38:21Z)
Accelerating Adaptive Retrieval Augmented Generation via Instruction-Driven Representation Reduction of Retrieval Overlaps [16.84310001807895]
This paper introduces a model-agnostic approach that can be applied to A-RAG methods.<n>Specifically, we use cache access and parallel generation to speed up the prefilling and decoding stages respectively.
arXiv Detail & Related papers (2025-05-19T05:39:38Z)
Improving Deep Assertion Generation via Fine-Tuning Retrieval-Augmented Pre-trained Language Models [20.71745514142851]
RetriGen is a retrieval-augmented deep assertion generation approach. We conduct experiments to evaluate RetriGen against six state-of-the-art approaches.
arXiv Detail & Related papers (2025-02-22T04:17:04Z)
Improving Retrieval-Augmented Generation through Multi-Agent Reinforcement Learning [51.54046200512198]
Retrieval-augmented generation (RAG) is extensively utilized to incorporate external, current knowledge into large language models.<n>A standard RAG pipeline may comprise several components, such as query rewriting, document retrieval, document filtering, and answer generation.<n>To overcome these challenges, we propose treating the RAG pipeline as a multi-agent cooperative task, with each component regarded as an RL agent.
arXiv Detail & Related papers (2025-01-25T14:24:50Z)
Chain-of-Retrieval Augmented Generation [72.06205327186069]
This paper introduces an approach for training o1-like RAG models that retrieve and reason over relevant information step by step before generating the final answer.<n>Our proposed method, CoRAG, allows the model to dynamically reformulate the query based on the evolving state.
arXiv Detail & Related papers (2025-01-24T09:12:52Z)
Toward General Instruction-Following Alignment for Retrieval-Augmented Generation [63.611024451010316]
Following natural instructions is crucial for the effective application of Retrieval-Augmented Generation (RAG) systems. We propose VIF-RAG, the first automated, scalable, and verifiable synthetic pipeline for instruction-following alignment in RAG systems.
arXiv Detail & Related papers (2024-10-12T16:30:51Z)
SFR-RAG: Towards Contextually Faithful LLMs [57.666165819196486]
Retrieval Augmented Generation (RAG) is a paradigm that integrates external contextual information with large language models (LLMs) to enhance factual accuracy and relevance. We introduce SFR-RAG, a small LLM that is instruction-textual with an emphasis on context-grounded generation and hallucination. We also present ConBench, a new evaluation framework compiling multiple popular and diverse RAG benchmarks.
arXiv Detail & Related papers (2024-09-16T01:08:18Z)
xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token [108.7069350303884]
xRAG is an innovative context compression method tailored for retrieval-augmented generation.<n>xRAG seamlessly integrates document embeddings into the language model representation space.<n> Experimental results demonstrate that xRAG achieves an average improvement of over 10% across six knowledge-intensive tasks.
arXiv Detail & Related papers (2024-05-22T16:15:17Z)
Evaluating Retrieval Quality in Retrieval-Augmented Generation [21.115495457454365]
Traditional end-to-end evaluation methods are computationally expensive. We propose eRAG, where each document in the retrieval list is individually utilized by the large language model within the RAG system. eRAG offers significant computational advantages, improving runtime and consuming up to 50 times less GPU memory than end-to-end evaluation.
arXiv Detail & Related papers (2024-04-21T21:22:28Z)
Unit Test Generation using Generative AI : A Comparative Performance Analysis of Autogeneration Tools [2.0686733932673604]
This research aims to experimentally investigate the effectiveness of Large Language Models (LLMs) for generating unit test scripts for Python programs. For experiments, we consider three types of code units: 1) Procedural scripts, 2) Function-based modular code, and 3) Class-based code. Our results show that ChatGPT's performance is comparable with Pynguin in terms of coverage, though for some cases its performance is superior to Pynguin.
arXiv Detail & Related papers (2023-12-17T06:38:11Z)
Revisiting and Improving Retrieval-Augmented Deep Assertion Generation [13.373681113601982]
Unit testing has become an essential activity in software development process. Yu et al. proposed an integrated approach (integration for short) to generate assertions for a unit test. Despite promising, there is still a knowledge gap as to why or where integration works or does not work.
arXiv Detail & Related papers (2023-09-19T02:39:02Z)
RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic Program Repair [75.40584530380589]
We propose a novel Retrieval-Augmented Patch Generation framework (RAP-Gen) RAP-Gen explicitly leveraging relevant fix patterns retrieved from a list of previous bug-fix pairs. We evaluate RAP-Gen on three benchmarks in two programming languages, including the TFix benchmark in JavaScript, and Code Refinement and Defects4J benchmarks in Java.
arXiv Detail & Related papers (2023-09-12T08:52:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.