Related papers: Semantics-Aligned, Curriculum-Driven, and Reasoning-Enhanced Vulnerability Repair Framework

Semantics-Aligned, Curriculum-Driven, and Reasoning-Enhanced Vulnerability Repair Framework

URL: http://arxiv.org/abs/2510.01002v1
Date: Wed, 01 Oct 2025 15:09:27 GMT
Title: Semantics-Aligned, Curriculum-Driven, and Reasoning-Enhanced Vulnerability Repair Framework
Authors: Chengran Yang, Ting Zhang, Jinfeng Jiang, Xin Zhou, Haoye Tian, Jieke Shi, Junkai Chen, Yikun Li, Eng Lieh Ouh, Lwin Khin Shar, David Lo,
Abstract summary: SeCuRepair is a semantics-aligned, curriculum-driven, and reasoning-enhanced framework for vulnerability repair.<n>At its core, SeCuRepair adopts a reason-then-edit paradigm, requiring the model to articulate why and how a vulnerability should be fixed.<n>SeCuRepair also moves beyond traditional supervised fine-tuning and employs semantics-aware reinforcement learning.
Score: 15.17681731375364
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Current learning-based Automated Vulnerability Repair (AVR) approaches, while promising, often fail to generalize effectively in real-world scenarios. Our diagnostic analysis reveals three fundamental weaknesses in state-of-the-art AVR approaches: (1) limited cross-repository generalization, with performance drops on unseen codebases; (2) inability to capture long-range dependencies, causing a performance degradation on complex, multi-hunk repairs; and (3) over-reliance on superficial lexical patterns, leading to significant performance drops on vulnerabilities with minor syntactic variations like variable renaming. To address these limitations, we propose SeCuRepair, a semantics-aligned, curriculum-driven, and reasoning-enhanced framework for vulnerability repair. At its core, SeCuRepair adopts a reason-then-edit paradigm, requiring the model to articulate why and how a vulnerability should be fixed before generating the patch. This explicit reasoning enforces a genuine understanding of repair logic rather than superficial memorization of lexical patterns. SeCuRepair also moves beyond traditional supervised fine-tuning and employs semantics-aware reinforcement learning, rewarding patches for their syntactic and semantic alignment with the oracle patch rather than mere token overlap. Complementing this, a difficulty-aware curriculum progressively trains the model, starting with simple fixes and advancing to complex, multi-hunk coordinated edits. We evaluate SeCuRepair on strict, repository-level splits of BigVul and newly crafted PrimeVul_AVR datasets. SeCuRepair significantly outperforms all baselines, surpassing the best-performing baselines by 34.52% on BigVul and 31.52% on PrimeVul\textsubscript{AVR} in terms of CodeBLEU, respectively. Comprehensive ablation studies further confirm that each component of our framework contributes to its final performance.

Related papers

Architecture-Aware Multi-Design Generation for Repository-Level Feature Addition [53.50448142467294]
RAIM is a multi-design and architecture-aware framework for repository-level feature addition.<n>It shifts away from linear patching by generating multiple diverse implementation designs.<n>Experiments on the NoCode-bench Verified dataset demonstrate that RAIM establishes a new state-of-the-art performance.
arXiv Detail & Related papers (2026-03-02T12:50:40Z)
Rethinking the Capability of Fine-Tuned Language Models for Automated Vulnerability Repair [5.847724760751716]
Learning-based automated vulnerability repair (AVR) techniques that utilize fine-tuned language models have shown promise in generating vulnerability patches.<n>Our empirical study reveals that state-of-the-art models often overfit to the training set and are evaluated using training, validation, and test sets that are not mutually exclusive.<n>We introduce L-AVRBench, a test-based benchmark tailored for learning-based, to overcome the limitations of match-based metrics and examine the models' true repair capabilities.
arXiv Detail & Related papers (2025-12-27T16:12:43Z)
Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding [54.05243949024302]
Existing robust MLLMs rely on implicit training/adaptation that focuses solely on visual encoder generalization.<n>We propose Robust-R1, a novel framework that explicitly models visual degradations through structured reasoning chains.<n>Our approach integrates: (i) supervised fine-tuning for degradation-aware reasoning foundations, (ii) reward-driven alignment for accurately perceiving degradation parameters, and (iii) dynamic reasoning depth scaling adapted to degradation intensity.
arXiv Detail & Related papers (2025-12-19T12:56:17Z)
Unleashing Degradation-Carrying Features in Symmetric U-Net: Simpler and Stronger Baselines for All-in-One Image Restoration [52.82397287366076]
All-in-one image restoration aims to handle diverse degradations (e.g., noise, blur, adverse weather) within a unified framework.<n>In this work, we reveal a critical insight: well-crafted feature extraction inherently encodes degradation-carrying information.<n>Our symmetric design preserves intrinsic degradation signals robustly, rendering simple additive fusion in skip connections.
arXiv Detail & Related papers (2025-12-11T12:20:31Z)
UniSER: A Foundation Model for Unified Soft Effects Removal [72.60782767314713]
We introduce UniSER, capable of addressing diverse degradations caused by soft effects within a single framework.<n>Our methodology centers on curating a massive 3.8M-pair dataset to ensure robustness and generalization.<n>This synergistic approach allows UniSER to significantly outperform both specialist and generalist models.
arXiv Detail & Related papers (2025-11-18T06:39:39Z)
Repairing Regex Vulnerabilities via Localization-Guided Instructions [6.033257307910245]
Regular expressions (regexes) expose systems to regular expression denial of service (ReDoS)<n>Current approaches, however, are hampered by a trade-off.<n>We introduce a hybrid framework, localized repair (LRR), designed to harness generalization while enforcing reliability.
arXiv Detail & Related papers (2025-10-10T06:15:43Z)
Scalable and Robust LLM Unlearning by Correcting Responses with Retrieved Exclusions [49.55618517046225]
Language models trained on web-scale corpora risk memorizing and exposing sensitive information.<n>We propose Corrective Unlearning with Retrieved Exclusions (CURE), a novel unlearning framework.<n>CURE verifies model outputs for leakage and revises them into safe responses.
arXiv Detail & Related papers (2025-09-30T09:07:45Z)
$φ^{\infty}$: Clause Purification, Embedding Realignment, and the Total Suppression of the Em Dash in Autoregressive Language Models [0.0]
We identify a critical vulnerability in autoregressive transformer language models where the em dash token induces semantic drift.<n>We propose a novel solution combining symbolic clause purification via the phi-infinity operator with targeted embedding matrix.
arXiv Detail & Related papers (2025-06-22T18:27:39Z)
Tady: A Neural Disassembler without Structural Constraint Violations [14.794789423601552]
We introduce Tady, a novel neural disassembler featuring an improved model architecture and a dedicated post-processing algorithm.<n>We show that Tady effectively eliminates structural constraint violations and functions with high efficiency, while maintaining instruction-level accuracy.
arXiv Detail & Related papers (2025-06-16T10:11:43Z)
Towards Understanding Bugs in Distributed Training and Inference Frameworks for Large Language Models [7.486731499255164]
This paper conducts the first large-scale empirical analysis of 308 fixed bugs across three popular distributed training/inference frameworks: DeepSpeed, Megatron-LM, and Colossal-AI.<n>We examine bug symptoms, root causes, bug identification and fixing efforts, and common low-effort fixing strategies.
arXiv Detail & Related papers (2025-06-12T07:24:59Z)
AlignRAG: Leveraging Critique Learning for Evidence-Sensitive Retrieval-Augmented Reasoning [61.28113271728859]
RAG has become a widely adopted paradigm for enabling knowledge-grounded large language models (LLMs)<n>Standard RAG pipelines often fail to ensure that model reasoning remains consistent with the evidence retrieved, leading to factual inconsistencies or unsupported conclusions.<n>In this work, we reinterpret RAG as Retrieval-Augmented Reasoning and identify a central but underexplored problem: textitReasoning Misalignment.
arXiv Detail & Related papers (2025-04-21T04:56:47Z)
Beyond Degradation Redundancy: Contrastive Prompt Learning for All-in-One Image Restoration [109.38288333994407]
Contrastive Prompt Learning (CPL) is a novel framework that fundamentally enhances prompt-task alignment.<n>Our framework establishes new state-of-the-art performance while maintaining parameter efficiency, offering a principled solution for unified image restoration.
arXiv Detail & Related papers (2025-04-14T08:24:57Z)
Framework for Progressive Knowledge Fusion in Large Language Models Through Structured Conceptual Redundancy Analysis [0.0]
The organization of latent knowledge within large-scale models poses unique challenges when addressing overlapping representations and optimizing contextual accuracy.<n>A framework was proposed to restructure these redundancies through advanced clustering techniques and dynamic thresholding.<n> Evaluations revealed improved memory efficiency and faster inference times, alongside better alignment in latent knowledge clusters that enhanced interpretability.
arXiv Detail & Related papers (2025-01-23T11:34:04Z)
SINDER: Repairing the Singular Defects of DINOv2 [61.98878352956125]
Vision Transformer models trained on large-scale datasets often exhibit artifacts in the patch token they extract. We propose a novel fine-tuning smooth regularization that rectifies structural deficiencies using only a small dataset.
arXiv Detail & Related papers (2024-07-23T20:34:23Z)
Scalable Federated Unlearning via Isolated and Coded Sharding [76.12847512410767]
Federated unlearning has emerged as a promising paradigm to erase the client-level data effect. This paper proposes a scalable federated unlearning framework based on isolated sharding and coded computing.
arXiv Detail & Related papers (2024-01-29T08:41:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.