Tracing Errors, Constructing Fixes: Repository-Level Memory Error Repair via Typestate-Guided Context Retrieval
- URL: http://arxiv.org/abs/2506.18394v1
- Date: Mon, 23 Jun 2025 08:30:00 GMT
- Title: Tracing Errors, Constructing Fixes: Repository-Level Memory Error Repair via Typestate-Guided Context Retrieval
- Authors: Xiao Cheng, Zhihao Guo, Huan Huo, Yulei Sui,
- Abstract summary: This paper introduces LTFix, a novel approach that harnesses the potential of Large Language Models (LLMs) for automated memory error repair.<n>Our approach utilizes a finite typestate automaton to guide the tracking of error-propagation paths and context trace, capturing both spatial (memory states) and temporal (execution history) dimensions of error behavior.
- Score: 9.737526306478598
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Memory-related errors in C programming continue to pose significant challenges in software development, primarily due to the complexities of manual memory management inherent in the language. These errors frequently serve as vectors for severe vulnerabilities, while their repair requires extensive knowledge of program logic and C's memory model. Automated Program Repair (APR) has emerged as a critical research area to address these challenges. Traditional APR approaches rely on expert-designed strategies and predefined templates, which are labor-intensive and constrained by the effectiveness of manual specifications. Deep learning techniques offer a promising alternative by automatically extracting repair patterns, but they require substantial training datasets and often lack interpretability. This paper introduces LTFix, a novel approach that harnesses the potential of Large Language Models (LLMs) for automated memory error repair, especially for complex repository-level errors that span multiple functions and files. We address two fundamental challenges in LLM-based memory error repair: a limited understanding of interprocedural memory management patterns and context window limitations for repository-wide analysis. Our approach utilizes a finite typestate automaton to guide the tracking of error-propagation paths and context trace, capturing both spatial (memory states) and temporal (execution history) dimensions of error behavior. This typestate-guided context retrieval strategy provides the LLM with concise yet semantically rich information relevant to erroneous memory management, effectively addressing the token limitation of LLMs.
Related papers
- MemOS: A Memory OS for AI System [116.87568350346537]
Large Language Models (LLMs) have become an essential infrastructure for Artificial General Intelligence (AGI)<n>Existing models mainly rely on static parameters and short-lived contextual states, limiting their ability to track user preferences or update knowledge over extended periods.<n>MemOS is a memory operating system that treats memory as a manageable system resource.
arXiv Detail & Related papers (2025-07-04T17:21:46Z) - Specification-Guided Repair of Arithmetic Errors in Dafny Programs using LLMs [84.30534714651093]
We present an innovative APR tool for Dafny, a verification-aware programming language.<n>We localize faults through a series of steps, which include using Hoare Logic to determine the state of each statement within the program.<n>We evaluate our approach using DafnyBench, a benchmark of real-world Dafny programs.
arXiv Detail & Related papers (2025-07-04T15:36:12Z) - MEMOIR: Lifelong Model Editing with Minimal Overwrite and Informed Retention for LLMs [82.34547399693966]
Existing methods for lifelong model editing compromise generalization, interfere with past edits, or fail to scale to long editing sequences.<n>We propose MEMOIR, a novel scalable framework that injects knowledge through a residual memory.<n>MeMOIR confines each edit to a distinct subset of the memory parameters, minimizing interference among edits.
arXiv Detail & Related papers (2025-06-09T16:16:42Z) - ReF Decompile: Relabeling and Function Call Enhanced Decompile [50.86228893636785]
The goal of decompilation is to convert compiled low-level code (e.g., assembly code) back into high-level programming languages.<n>This task supports various reverse engineering applications, such as vulnerability identification, malware analysis, and legacy software migration.
arXiv Detail & Related papers (2025-02-17T12:38:57Z) - Data Wrangling Task Automation Using Code-Generating Language Models [1.629488438606726]
We present an automated system that generates executable code for tasks like missing value imputation, error detection, and error correction.<n>Our system aims to identify inherent patterns in the data while leveraging external knowledge, effectively addressing both memory-dependent and memory-independent tasks.
arXiv Detail & Related papers (2025-02-05T03:36:29Z) - CodeUnlearn: Amortized Zero-Shot Machine Unlearning in Language Models Using Discrete Concept [5.345828824625758]
We propose a novel amortized unlearning approach using codebook features and Sparse Autoencoders (SAEs)
By leveraging a bottleneck to decompose the activation space and regulate information flow, our method efficiently unlearns targeted information while preserving the model's performance on unrelated data.
arXiv Detail & Related papers (2024-10-08T10:26:22Z) - Revisiting Evolutionary Program Repair via Code Language Model [11.711739409758476]
This paper introduces ARJA-CLM, which integrates the multiobjective evolutionary algorithm with CLM to fix multilocation bugs in Java projects.
We also propose a context-aware prompt construction stratege, which enriches the prompt with additional information about accessible fields and methods for the CLM generating candidate statements.
arXiv Detail & Related papers (2024-08-20T01:57:45Z) - MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory [49.96019697955383]
We introduce MemLLM, a novel method of enhancing large language models (LLMs) by integrating a structured and explicit read-and-write memory module.<n>Our experiments indicate that MemLLM enhances the LLM's performance and interpretability, in language modeling in general and knowledge-intensive tasks in particular.
arXiv Detail & Related papers (2024-04-17T18:13:16Z) - A Deep Dive into Large Language Models for Automated Bug Localization and Repair [12.756202755547024]
Large language models (LLMs) have shown impressive effectiveness in various software engineering tasks, including automated program repair (APR)
In this study, we take a deep dive into automated bug fixing utilizing LLMs.
This methodological separation of bug localization and fixing using different LLMs enables effective integration of diverse contextual information.
Toggle achieves the new state-of-the-art (SOTA) performance on the CodeXGLUE code refinement benchmark.
arXiv Detail & Related papers (2024-04-17T17:48:18Z) - Online Adaptation of Language Models with a Memory of Amortized Contexts [82.02369596879817]
Memory of Amortized Contexts (MAC) is an efficient and effective online adaptation framework for large language models.
We show how MAC can be combined with and improve the performance of popular alternatives such as retrieval augmented generations.
arXiv Detail & Related papers (2024-03-07T08:34:57Z) - RET-LLM: Towards a General Read-Write Memory for Large Language Models [53.288356721954514]
RET-LLM is a novel framework that equips large language models with a general write-read memory unit.
Inspired by Davidsonian semantics theory, we extract and save knowledge in the form of triplets.
Our framework exhibits robust performance in handling temporal-based question answering tasks.
arXiv Detail & Related papers (2023-05-23T17:53:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.