CCrepairBench: A High-Fidelity Benchmark and Reinforcement Learning Framework for C++ Compilation Repair
- URL: http://arxiv.org/abs/2509.15690v1
- Date: Fri, 19 Sep 2025 07:06:27 GMT
- Title: CCrepairBench: A High-Fidelity Benchmark and Reinforcement Learning Framework for C++ Compilation Repair
- Authors: Weixuan Sun, Jucai Zhai, Dengfeng Liu, Xin Zhang, Xiaojun Wu, Qiaobo Hao, AIMgroup, Yang Fang, Jiuyang Tang,
- Abstract summary: We present CCrepair, a novel, large-scale C++ compilation error dataset constructed through a sophisticated generate-and-verify pipeline.<n>Second, we propose a Reinforcement Learning paradigm guided by a hybrid reward signal, shifting the focus from mere compilability to the semantic quality of the fix.
- Score: 18.624106902572155
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The automated repair of C++ compilation errors presents a significant challenge, the resolution of which is critical for developer productivity. Progress in this domain is constrained by two primary factors: the scarcity of large-scale, high-fidelity datasets and the limitations of conventional supervised methods, which often fail to generate semantically correct patches.This paper addresses these gaps by introducing a comprehensive framework with three core contributions. First, we present CCrepair, a novel, large-scale C++ compilation error dataset constructed through a sophisticated generate-and-verify pipeline. Second, we propose a Reinforcement Learning (RL) paradigm guided by a hybrid reward signal, shifting the focus from mere compilability to the semantic quality of the fix. Finally, we establish the robust, two-stage evaluation system providing this signal, centered on an LLM-as-a-Judge whose reliability has been rigorously validated against the collective judgments of a panel of human experts. This integrated approach aligns the training objective with generating high-quality, non-trivial patches that are both syntactically and semantically correct. The effectiveness of our approach was demonstrated experimentally. Our RL-trained Qwen2.5-1.5B-Instruct model achieved performance comparable to a Qwen2.5-14B-Instruct model, validating the efficiency of our training paradigm. Our work provides the research community with a valuable new dataset and a more effective paradigm for training and evaluating robust compilation repair models, paving the way for more practical and reliable automated programming assistants.
Related papers
- Search-R2: Enhancing Search-Integrated Reasoning via Actor-Refiner Collaboration [49.9937230730202]
We propose Search-R2, a novel Actor-Refiner collaboration framework that enhances reasoning through targeted intervention.<n>Our approach decomposes the generation process into an Actor, which produces initial reasoning trajectories.<n>We show that Search-R2 consistently outperforms strong RAG and RL-based baselines across model scales.
arXiv Detail & Related papers (2026-02-03T15:32:09Z) - LLMs in Code Vulnerability Analysis: A Proof of Concept [0.3441021278275805]
Traditional software security analysis methods struggle to keep pace with the scale and complexity of moderns.<n>This paper explores the incorporation of code-specific and general-purpose Large Language Models to automate critical software security tasks.
arXiv Detail & Related papers (2026-01-13T16:16:11Z) - Rewarding the Journey, Not Just the Destination: A Composite Path and Answer Self-Scoring Reward Mechanism for Test-Time Reinforcement Learning [29.778703252962092]
Reinforcement Learning (RL) has emerged as a powerful paradigm for advancing Large Language Models (LLMs)<n>We develop a novel test-time reward mechanism that operates without external supervision.
arXiv Detail & Related papers (2025-10-20T07:53:51Z) - Automatic Failure Attribution and Critical Step Prediction Method for Multi-Agent Systems Based on Causal Inference [8.823529310904162]
Multi-agent systems (MAS) are critical for automating complex tasks, yet their practical deployment is hampered by the challenge of failure attribution.<n>We introduce the first failure attribution framework for MAS grounded in multi-granularity causal inference.
arXiv Detail & Related papers (2025-09-10T15:22:00Z) - Feedback-Driven Tool-Use Improvements in Large Language Models via Automated Build Environments [70.42705564227548]
We propose an automated environment construction pipeline for large language models (LLMs)<n>This enables the creation of high-quality training environments that provide detailed and measurable feedback without relying on external tools.<n>We also introduce a verifiable reward mechanism that evaluates both the precision of tool use and the completeness of task execution.
arXiv Detail & Related papers (2025-08-12T09:45:19Z) - NatureGAIA: Pushing the Frontiers of GUI Agents with a Challenging Benchmark and High-Quality Trajectory Dataset [16.676904484703]
We introduce NaturalGAIA, a novel benchmark engineered on the principle of Causal Pathways.<n>This paradigm structures complex tasks into a series of verifiable atomic steps, ensuring rigorous, fully automated, and reproducible standard for assessment.<n>We then utilize this dataset to perform Reinforcement FineTuning (RFT) on the Q2.5-VL-7B model.
arXiv Detail & Related papers (2025-08-02T11:53:41Z) - SoftPipe: A Soft-Guided Reinforcement Learning Framework for Automated Data Preparation [10.764970149373845]
We introduce SoftPipe, a novel RL framework that replaces rigid hard constraints'' with a flexible soft guidance'' paradigm.<n>We demonstrate that SoftPipe achieves up to a 13.9% improvement in pipeline quality and 2.8$times$ faster convergence compared to existing methods.
arXiv Detail & Related papers (2025-07-18T07:43:22Z) - RoHOI: Robustness Benchmark for Human-Object Interaction Detection [78.18946529195254]
Human-Object Interaction (HOI) detection is crucial for robot-human assistance, enabling context-aware support.<n>We introduce the first benchmark for HOI detection, evaluating model resilience under diverse challenges.<n>Our benchmark, RoHOI, includes 20 corruption types based on the HICO-DET and V-COCO datasets and a new robustness-focused metric.
arXiv Detail & Related papers (2025-07-12T01:58:04Z) - Training Language Models to Generate Quality Code with Program Analysis Feedback [66.0854002147103]
Code generation with large language models (LLMs) is increasingly adopted in production but fails to ensure code quality.<n>We propose REAL, a reinforcement learning framework that incentivizes LLMs to generate production-quality code.
arXiv Detail & Related papers (2025-05-28T17:57:47Z) - Synthetic Code Surgery: Repairing Bugs and Vulnerabilities with LLMs and Synthetic Data [0.0]
This paper presents a novel methodology for enhancing Automated Program Repair (APR) through synthetic data generation utilizing Large Language Models (LLMs)<n>The proposed approach addresses this limitation through a two-phase process: a synthetic sample generation followed by a rigorous quality assessment.<n> Experimental evaluation on the VulRepair test set dataset showed statistically significant improvements in Perfect Prediction rates.
arXiv Detail & Related papers (2025-05-12T09:14:20Z) - Learning to Solve and Verify: A Self-Play Framework for Code and Test Generation [69.62857948698436]
Recent advances in large language models (LLMs) have improved their performance on coding benchmarks.<n>However, improvement is plateauing due to the exhaustion of readily available high-quality data.<n>We propose Sol-Ver, a self-play solver-verifier framework that jointly improves a single model's code and test generation capacity.
arXiv Detail & Related papers (2025-02-20T18:32:19Z) - SDPERL: A Framework for Software Defect Prediction Using Ensemble Feature Extraction and Reinforcement Learning [0.0]
This paper proposes an innovative framework for software defect prediction.<n>It combines ensemble feature extraction with reinforcement learning (RL)--based feature selection.<n>We claim that this work is among the first in recent efforts to address this challenge at the file-level granularity.
arXiv Detail & Related papers (2024-12-10T21:16:05Z) - End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures.
We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.