Wired for Reuse: Automating Context-Aware Code Adaptation in IDEs via LLM-Based Agent
- URL: http://arxiv.org/abs/2507.01315v2
- Date: Sat, 04 Oct 2025 01:30:58 GMT
- Title: Wired for Reuse: Automating Context-Aware Code Adaptation in IDEs via LLM-Based Agent
- Authors: Taiming Wang, Yanjie Jiang, Chunhao Dong, Yuxia Zhang, Hui Liu,
- Abstract summary: We introduce WIRL, an agent for code wiring framed as a Retrieval-Augmented Generation (RAG) infilling task.<n>We evaluate WIRL on a carefully curated, high-quality dataset consisting of real-world code adaptation scenarios.
- Score: 7.631972490231939
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Copy-paste-modify is a widespread and pragmatic practice in software development, where developers adapt reused code snippets, sourced from platforms such as Stack Overflow, GitHub, or LLM outputs, into their local codebase. A critical yet underexplored aspect of this adaptation is code wiring: the context-aware process of substituting unresolved variables in pasted code with suitable variables or expressions from the surrounding context. Existing solutions either rely on heuristic rules or historical templates, often failing to effectively utilize contextual information, despite studies showing that over half of adaptation cases are context-dependent. In this paper, we introduce WIRL, an LLM-based agent for code wiring framed as a Retrieval-Augmented Generation (RAG) infilling task. WIRL combines an LLM, a customized toolkit, and an orchestration module to identify unresolved variables, retrieve context, and perform context-aware substitutions. To balance efficiency and autonomy, the agent adopts a mixed strategy: deterministic rule-based steps for common patterns, and a state-machine-guided decision process for intelligent exploration. We evaluate WIRL on a carefully curated, high-quality dataset consisting of real-world code adaptation scenarios. Our approach achieves an exact match precision of 91.7% and a recall of 90.0%, outperforming advanced LLMs by 22.6 and 13.7 percentage points in precision and recall, respectively, and surpassing IntelliJ IDEA by 54.3 and 49.9 percentage points. These results underscore its practical utility, particularly in contexts with complex variable dependencies or multiple unresolved variables. We believe WIRL paves the way for more intelligent and context-aware developer assistance in modern IDEs.
Related papers
- From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence [150.3696990310269]
Large language models (LLMs) have transformed automated software development by enabling direct translation of natural language descriptions into functional code.<n>We provide a comprehensive synthesis and practical guide (a series of analytic and probing experiments) about code LLMs.<n>We analyze the code capability of the general LLMs (GPT-4, Claude, LLaMA) and code-specialized LLMs (StarCoder, Code LLaMA, DeepSeek-Coder, and QwenCoder)
arXiv Detail & Related papers (2025-11-23T17:09:34Z) - LM4Opt-RA: A Multi-Candidate LLM Framework with Structured Ranking for Automating Network Resource Allocation [0.7933039558471408]
We tackle complex analytical and mathematical reasoning tasks requiring nuanced contextual understanding.<n>Existing benchmarks datasets cannot address the complexities of such problems with dynamic environments, variables, and heterogeneous constraints.<n>We introduce NL4RA, a curated dataset comprising 50 resource allocation optimization problems formulated as LP, ILP, and MILP.<n>We then evaluate the performance of well-known open-source LLMs with varying parameter counts.
arXiv Detail & Related papers (2025-11-13T23:19:43Z) - EDIT-Bench: Evaluating LLM Abilities to Perform Real-World Instructed Code Edits [72.23150343093447]
We introduce EDIT-Bench, a benchmark for evaluatingstructed code editing capabilities grounded in real-world usage.<n>EDIT-Bench comprises of 545 problems, multiple natural and programming languages, and a diverse set of real-world use cases.<n>We find that model performance varies across different categories of user instructions.
arXiv Detail & Related papers (2025-11-06T16:05:28Z) - How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on $τ$-bench [58.114899897566964]
In a multi-turn conversational environment, large language models (LLMs) often struggle with consistent reasoning and adherence to domain-specific policies.<n>We propose the Input-Reformulation Multi-Agent (IRMA) framework, which automatically reformulates user queries augmented with relevant domain rules.<n>IRMA significantly outperforms ReAct, Function Calling, and Self-Reflection by 16.1%, 12.7%, and 19.1%, respectively.
arXiv Detail & Related papers (2025-08-28T15:57:33Z) - cMALC-D: Contextual Multi-Agent LLM-Guided Curriculum Learning with Diversity-Based Context Blending [3.939989712024268]
Contextual MARL (cMARL) addresses this by parameterizing environments with context variables and training a context-agnostic policy.<n>Existing cMARL methods attempt to use curriculum learning to help train and evaluate context-agnostic policies.<n>We propose Contextual Multi-Agent LLM-Guided Curriculum Learning with Diversity-Based Context Blending (cMALC-D)
arXiv Detail & Related papers (2025-08-28T14:16:17Z) - Meta-RAG on Large Codebases Using Code Summarization [11.415083231118142]
Large Language Model (LLM) systems have been at the forefront of applied Artificial Intelligence (AI) research in a multitude of domains.<n>We propose a multi-agent system to localize bugs in large pre-existings using information retrieval and LLMs.<n>Our system introduces a novel Retrieval Augmented Generation (RAG) approach, Meta-RAG, where we utilize summaries to condenses by an average of 79.8%, into a compact, structured, natural language representation.
arXiv Detail & Related papers (2025-08-04T17:01:10Z) - CAAD: Context-Aware Adaptive Decoding for Truthful Text Generation [31.469511576774252]
We propose a context-aware adaptive decoding method for large language models.<n>Our approach achieves a 2.8 percent average improvement on TruthfulQA.<n>Our model-agnostic, scalable, and efficient method requires only a single generation pass.
arXiv Detail & Related papers (2025-08-04T08:28:25Z) - Evaluating Large Language Models on Non-Code Software Engineering Tasks [4.381476817430934]
Large Language Models (LLMs) have demonstrated remarkable capabilities in code understanding and generation.<n>We present the first comprehensive benchmark, which we name Software Engineering Language Understanding' (SELU)<n>SELU covers classification, regression, Named Entity Recognition (NER) and Masked Language Modeling (MLM) targets, with data drawn from diverse sources.
arXiv Detail & Related papers (2025-06-12T15:52:32Z) - MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering [57.156093929365255]
Gym-style framework for systematically reinforcement learning, evaluating, and improving autonomous large language model (LLM) agents.<n>MLE-Dojo covers diverse, open-ended MLE tasks carefully curated to reflect realistic engineering scenarios.<n>Its fully executable environment supports comprehensive agent training via both supervised fine-tuning and reinforcement learning.
arXiv Detail & Related papers (2025-05-12T17:35:43Z) - SweRank: Software Issue Localization with Code Ranking [109.3289316191729]
SweRank is an efficient retrieve-and-rerank framework for software issue localization.<n>We construct SweLoc, a large-scale dataset curated from public GitHub repositories.<n>We show that SweRank achieves state-of-the-art performance, outperforming both prior ranking models and costly agent-based systems.
arXiv Detail & Related papers (2025-05-07T19:44:09Z) - Program Semantic Inequivalence Game with Large Language Models [10.358176296850639]
Large Language Models (LLMs) can achieve strong performance on everyday coding tasks, but they can fail on complex tasks that require non-trivial reasoning about program semantics.<n>In this work, we explore a method to synthetically generate code reasoning training data based on a semantic inequivalence game SInQ.<n>We prove that this setup enables theoretically unlimited improvement through self-play in the limit of infinite computational resources.
arXiv Detail & Related papers (2025-05-02T20:03:35Z) - Collab: Controlled Decoding using Mixture of Agents for LLM Alignment [90.6117569025754]
Reinforcement learning from human feedback has emerged as an effective technique to align Large Language models.<n>Controlled Decoding provides a mechanism for aligning a model at inference time without retraining.<n>We propose a mixture of agent-based decoding strategies leveraging the existing off-the-shelf aligned LLM policies.
arXiv Detail & Related papers (2025-03-27T17:34:25Z) - Scaling Autonomous Agents via Automatic Reward Modeling And Planning [52.39395405893965]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of tasks.<n>However, they still struggle with problems requiring multi-step decision-making and environmental feedback.<n>We propose a framework that can automatically learn a reward model from the environment without human annotations.
arXiv Detail & Related papers (2025-02-17T18:49:25Z) - Instruct or Interact? Exploring and Eliciting LLMs' Capability in Code Snippet Adaptation Through Prompt Engineering [19.019004855931676]
Large language models (LLMs) have confirmed their effectiveness in the code generation task with promising results.
Their performance on adaptation, a reuse-oriented and context-dependent code change prediction task, is still unclear.
We propose an interactive prompting approach to elicit LLMs' adaptation ability.
arXiv Detail & Related papers (2024-11-23T09:40:36Z) - On the Impacts of Contexts on Repository-Level Code Generation [5.641402231731082]
We present RepoExec, a novel benchmark designed to evaluate repository-level code generation.<n>We focus on three key aspects: executability, functional correctness through comprehensive test case generation, and accurate utilization of cross-file contexts.
arXiv Detail & Related papers (2024-06-17T10:45:22Z) - Robust and Scalable Model Editing for Large Language Models [75.95623066605259]
We propose EREN (Edit models by REading Notes) to improve the scalability and robustness of LLM editing.
Unlike existing techniques, it can integrate knowledge from multiple edits, and correctly respond to syntactically similar but semantically unrelated inputs.
arXiv Detail & Related papers (2024-03-26T06:57:23Z) - How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback.
Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities.
We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z) - ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code [76.84199699772903]
ML-Bench is a benchmark rooted in real-world programming applications that leverage existing code repositories to perform tasks.
To evaluate both Large Language Models (LLMs) and AI agents, two setups are employed: ML-LLM-Bench for assessing LLMs' text-to-code conversion within a predefined deployment environment, and ML-Agent-Bench for testing autonomous agents in an end-to-end task execution within a Linux sandbox environment.
arXiv Detail & Related papers (2023-11-16T12:03:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.