Repairing Regex Vulnerabilities via Localization-Guided Instructions
- URL: http://arxiv.org/abs/2510.09037v1
- Date: Fri, 10 Oct 2025 06:15:43 GMT
- Title: Repairing Regex Vulnerabilities via Localization-Guided Instructions
- Authors: Sicheol Sung, Joonghyuk Hahn, Yo-Sub Han,
- Abstract summary: Regular expressions (regexes) expose systems to regular expression denial of service (ReDoS)<n>Current approaches, however, are hampered by a trade-off.<n>We introduce a hybrid framework, localized repair (LRR), designed to harness generalization while enforcing reliability.
- Score: 6.033257307910245
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Regular expressions (regexes) are foundational to modern computing for critical tasks like input validation and data parsing, yet their ubiquity exposes systems to regular expression denial of service (ReDoS), a vulnerability requiring automated repair methods. Current approaches, however, are hampered by a trade-off. Symbolic, rule-based system are precise but fails to repair unseen or complex vulnerability patterns. Conversely, large language models (LLMs) possess the necessary generalizability but are unreliable for tasks demanding strict syntactic and semantic correctness. We resolve this impasse by introducing a hybrid framework, localized regex repair (LRR), designed to harness LLM generalization while enforcing reliability. Our core insight is to decouple problem identification from the repair process. First, a deterministic, symbolic module localizes the precise vulnerable subpattern, creating a constrained and tractable problem space. Then, the LLM invoked to generate a semantically equivalent fix for this isolated segment. This combined architecture successfully resolves complex repair cases intractable for rule-based repair while avoiding the semantic errors of LLM-only approaches. Our work provides a validated methodology for solving such problems in automated repair, improving the repair rate by 15.4%p over the state-of-the-art. Our code is available at https://github.com/cdltlehf/LRR.
Related papers
- RealSec-bench: A Benchmark for Evaluating Secure Code Generation in Real-World Repositories [58.32028251925354]
Large Language Models (LLMs) have demonstrated remarkable capabilities in code generation, but their proficiency in producing secure code remains a critical, under-explored area.<n>We introduce RealSec-bench, a new benchmark for secure code generation meticulously constructed from real-world, high-risk Java repositories.
arXiv Detail & Related papers (2026-01-30T08:29:01Z) - Semantics-Aligned, Curriculum-Driven, and Reasoning-Enhanced Vulnerability Repair Framework [15.17681731375364]
SeCuRepair is a semantics-aligned, curriculum-driven, and reasoning-enhanced framework for vulnerability repair.<n>At its core, SeCuRepair adopts a reason-then-edit paradigm, requiring the model to articulate why and how a vulnerability should be fixed.<n>SeCuRepair also moves beyond traditional supervised fine-tuning and employs semantics-aware reinforcement learning.
arXiv Detail & Related papers (2025-10-01T15:09:27Z) - Scalable and Robust LLM Unlearning by Correcting Responses with Retrieved Exclusions [49.55618517046225]
Language models trained on web-scale corpora risk memorizing and exposing sensitive information.<n>We propose Corrective Unlearning with Retrieved Exclusions (CURE), a novel unlearning framework.<n>CURE verifies model outputs for leakage and revises them into safe responses.
arXiv Detail & Related papers (2025-09-30T09:07:45Z) - CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward [50.97588334916863]
We develop CompassVerifier, an accurate and robust lightweight verifier model for evaluation and outcome reward.<n>It demonstrates multi-domain competency spanning math, knowledge, and diverse reasoning tasks, with the capability to process various answer types.<n>We introduce VerifierBench benchmark comprising model outputs collected from multiple data sources, augmented through manual analysis of metaerror patterns to enhance CompassVerifier.
arXiv Detail & Related papers (2025-08-05T17:55:24Z) - Repairing vulnerabilities without invisible hands. A differentiated replication study on LLMs [5.10123605644148]
Automated Vulnerability Repair (AVR) is a fast-growing branch of program repair.<n>Recent studies show that large language models (LLMs) outperform traditional techniques.
arXiv Detail & Related papers (2025-07-28T16:39:16Z) - Specification-Guided Repair of Arithmetic Errors in Dafny Programs using LLMs [79.74676890436174]
We present an APR tool for Dafny that uses formal specifications as oracles for fault localization and repair.<n>We localize faults through a series of steps, which include using Hoare logic to determine the state of each statement within the program.<n>Our tool achieves 89.6% fault localization coverage and GPT-4o mini yields the highest repair success rate of 74.18%.
arXiv Detail & Related papers (2025-07-04T15:36:12Z) - Tracing Errors, Constructing Fixes: Repository-Level Memory Error Repair via Typestate-Guided Context Retrieval [9.737526306478598]
This paper introduces LTFix, a novel approach that harnesses the potential of Large Language Models (LLMs) for automated memory error repair.<n>Our approach utilizes a finite typestate automaton to guide the tracking of error-propagation paths and context trace, capturing both spatial (memory states) and temporal (execution history) dimensions of error behavior.
arXiv Detail & Related papers (2025-06-23T08:30:00Z) - SemAgent: A Semantics Aware Program Repair Agent [14.80363334219173]
SemAgent is a novel workflow-based procedure that leverages issue, code, and execution semantics to generate patches that are complete.<n>We achieve this through a novel pipeline that (a) leverages execution semantics to retrieve relevant context, (b) comprehends issue-semantics via generalized abstraction, and (c) isolates code-semantics within the context of this abstraction.<n>Our evaluations show that our methodology achieves a solve rate of 44.66% on the SWEBench-Lite benchmark beating all other workflow-based approaches, and an absolute improvement of 7.66% compared to our baseline.
arXiv Detail & Related papers (2025-06-19T23:27:58Z) - Repairs in a Block World: A New Benchmark for Handling User Corrections with Multi-Modal Language Models [48.42142115255159]
We release BlockWorld-Repairs: a dataset of multi-modal TPR sequences in an instruction-following manipulation task.
We evaluate several state-of-the-art Vision and Language Models (VLM) across multiple settings, focusing on their capability to process and accurately respond to TPRs.
Our results suggest that these models are not yet ready to be deployed in multi-modal collaborative settings.
arXiv Detail & Related papers (2024-09-21T21:06:25Z) - NAVRepair: Node-type Aware C/C++ Code Vulnerability Repair [14.152755184229374]
NAVRepair is a novel framework that combines the node-type information extracted fromASTs with error types, specifically targeting C/C++ vulnerabilities.
We achieve a 26% higher accuracy compared to an existing LLM-based C/C++ vulnerability repair method.
arXiv Detail & Related papers (2024-05-08T11:58:55Z) - RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic
Program Repair [75.40584530380589]
We propose a novel Retrieval-Augmented Patch Generation framework (RAP-Gen)
RAP-Gen explicitly leveraging relevant fix patterns retrieved from a list of previous bug-fix pairs.
We evaluate RAP-Gen on three benchmarks in two programming languages, including the TFix benchmark in JavaScript, and Code Refinement and Defects4J benchmarks in Java.
arXiv Detail & Related papers (2023-09-12T08:52:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.