SeRe: A Security-Related Code Review Dataset Aligned with Real-World Review Activities
- URL: http://arxiv.org/abs/2601.01042v1
- Date: Sat, 03 Jan 2026 02:39:53 GMT
- Title: SeRe: A Security-Related Code Review Dataset Aligned with Real-World Review Activities
- Authors: Zixiao Zhao, Yanjie Jiang, Hui Liu, Kui Liu, Lu Zhang,
- Abstract summary: Existing datasets and studies primarily focus on general-purpose code review comments.<n>We introduce textbfSeRe, a textbfsecurity-related code review dataset, constructed using an active learning-based ensemble classification approach.<n>We extracted 6,732 security-related reviews from 373,824 raw review instances, ensuring representativeness across multiple programming languages.
- Score: 8.215547096412346
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Software security vulnerabilities can lead to severe consequences, making early detection essential. Although code review serves as a critical defense mechanism against security flaws, relevant feedback remains scarce due to limited attention to security issues or a lack of expertise among reviewers. Existing datasets and studies primarily focus on general-purpose code review comments, either lacking security-specific annotations or being too limited in scale to support large-scale research. To bridge this gap, we introduce \textbf{SeRe}, a \textbf{security-related code review dataset}, constructed using an active learning-based ensemble classification approach. The proposed approach iteratively refines model predictions through human annotations, achieving high precision while maintaining reasonable recall. Using the fine-tuned ensemble classifier, we extracted 6,732 security-related reviews from 373,824 raw review instances, ensuring representativeness across multiple programming languages. Statistical analysis indicates that SeRe generally \textbf{aligns with real-world security-related review distribution}. To assess both the utility of SeRe and the effectiveness of existing code review comment generation approaches, we benchmark state-of-the-art approaches on security-related feedback generation. By releasing SeRe along with our benchmark results, we aim to advance research in automated security-focused code review and contribute to the development of more effective secure software engineering practices.
Related papers
- From Detection to Prevention: Explaining Security-Critical Code to Avoid Vulnerabilities [2.490168997159702]
This work explores a proactive strategy to prevent vulnerabilities by highlighting code regions that implement security-critical functionality.<n>We present an IntelliJ IDEA plugin prototype that uses code-level software metrics to identify potentially security-critical methods.
arXiv Detail & Related papers (2026-01-31T13:16:01Z) - SafeRBench: A Comprehensive Benchmark for Safety Assessment in Large Reasoning Models [60.8821834954637]
We present SafeRBench, the first benchmark that assesses LRM safety end-to-end.<n>We pioneer the incorporation of risk categories and levels into input design.<n>We introduce a micro-thought chunking mechanism to segment long reasoning traces into semantically coherent units.
arXiv Detail & Related papers (2025-11-19T06:46:33Z) - SecureReviewer: Enhancing Large Language Models for Secure Code Review through Secure-aware Fine-tuning [8.229920162000369]
We propose SecureReviewer to identify and resolve security-related issues during code review.<n>We first construct a dataset tailored for training and evaluating secure code review capabilities.<n>We integrate the RAG technique, which grounds the generated comments in domain-specific security knowledge.
arXiv Detail & Related papers (2025-10-30T13:06:11Z) - iCodeReviewer: Improving Secure Code Review with Mixture of Prompts [5.322602557660654]
iCodeReviewer is an automated secure code review approach based on large language models (LLMs)<n>Experiment results demonstrate the effectiveness of iCodeReviewer in security issue identification and localization with an F1 of 63.98%.<n>The review comments generated by iCodeReviewer also achieve a high acceptance rate up to 84% when it is deployed in production environments.
arXiv Detail & Related papers (2025-10-14T06:30:59Z) - GitHub's Copilot Code Review: Can AI Spot Security Flaws Before You Commit? [0.0]
This study evaluates the effectiveness of GitHub Copilot's recently introduced code review feature in detecting security vulnerabilities.<n>Contrary to expectations, our results reveal that Copilot's code review frequently fails to detect critical vulnerabilities.<n>Our results highlight the continued necessity of dedicated security tools and manual code audits to ensure robust software security.
arXiv Detail & Related papers (2025-09-17T02:56:21Z) - CARE: Decoding Time Safety Alignment via Rollback and Introspection Intervention [68.95008546581339]
Existing decoding-time interventions, such as Contrastive Decoding, often force a severe trade-off between safety and response quality.<n>We propose CARE, a novel framework for decoding-time safety alignment that integrates three key components.<n>The framework achieves a superior balance of safety, quality, and efficiency, attaining a low harmful response rate and minimal disruption to the user experience.
arXiv Detail & Related papers (2025-09-01T04:50:02Z) - CoCoNUTS: Concentrating on Content while Neglecting Uninformative Textual Styles for AI-Generated Peer Review Detection [60.52240468810558]
We introduce CoCoNUTS, a content-oriented benchmark built upon a fine-grained dataset of AI-generated peer reviews.<n>We also develop CoCoDet, an AI review detector via a multi-task learning framework, to achieve more accurate and robust detection of AI involvement in review content.
arXiv Detail & Related papers (2025-08-28T06:03:11Z) - DATABench: Evaluating Dataset Auditing in Deep Learning from an Adversarial Perspective [70.77570343385928]
We introduce a novel taxonomy, classifying existing methods based on their reliance on internal features (IF) (inherent to the data) versus external features (EF) (artificially introduced for auditing)<n>We formulate two primary attack types: evasion attacks, designed to conceal the use of a dataset, and forgery attacks, intending to falsely implicate an unused dataset.<n>Building on the understanding of existing methods and attack objectives, we further propose systematic attack strategies: decoupling, removal, and detection for evasion; adversarial example-based methods for forgery.<n>Our benchmark, DATABench, comprises 17 evasion attacks, 5 forgery attacks, and 9
arXiv Detail & Related papers (2025-07-08T03:07:15Z) - Safety by Measurement: A Systematic Literature Review of AI Safety Evaluation Methods [0.0]
This literature review consolidates the rapidly evolving field of AI safety evaluations.<n>It proposes a systematic taxonomy around three dimensions: what properties we measure, how we measure them, and how these measurements integrate into frameworks.
arXiv Detail & Related papers (2025-05-08T16:55:07Z) - Improving Automated Secure Code Reviews: A Synthetic Dataset for Code Vulnerability Flaws [0.0]
We propose the creation of a synthetic dataset consisting of vulnerability-focused reviews that specifically comment on security flaws.<n>Our approach leverages Large Language Models (LLMs) to generate human-like code review comments for vulnerabilities.
arXiv Detail & Related papers (2025-04-22T23:07:24Z) - Advancing Embodied Agent Security: From Safety Benchmarks to Input Moderation [52.83870601473094]
Embodied agents exhibit immense potential across a multitude of domains.<n>Existing research predominantly concentrates on the security of general large language models.<n>This paper introduces a novel input moderation framework, meticulously designed to safeguard embodied agents.
arXiv Detail & Related papers (2025-04-22T08:34:35Z) - LLM-Safety Evaluations Lack Robustness [58.334290876531036]
We argue that current safety alignment research efforts for large language models are hindered by many intertwined sources of noise.<n>We propose a set of guidelines for reducing noise and bias in evaluations of future attack and defense papers.
arXiv Detail & Related papers (2025-03-04T12:55:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.