Related papers: SeRe: A Security-Related Code Review Dataset Aligned with Real-World Review Activities

SeRe: A Security-Related Code Review Dataset Aligned with Real-World Review Activities

URL: http://arxiv.org/abs/2601.01042v1
Date: Sat, 03 Jan 2026 02:39:53 GMT
Title: SeRe: A Security-Related Code Review Dataset Aligned with Real-World Review Activities
Authors: Zixiao Zhao, Yanjie Jiang, Hui Liu, Kui Liu, Lu Zhang,
Abstract summary: Existing datasets and studies primarily focus on general-purpose code review comments.<n>We introduce textbfSeRe, a textbfsecurity-related code review dataset, constructed using an active learning-based ensemble classification approach.<n>We extracted 6,732 security-related reviews from 373,824 raw review instances, ensuring representativeness across multiple programming languages.
Score: 8.215547096412346
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Software security vulnerabilities can lead to severe consequences, making early detection essential. Although code review serves as a critical defense mechanism against security flaws, relevant feedback remains scarce due to limited attention to security issues or a lack of expertise among reviewers. Existing datasets and studies primarily focus on general-purpose code review comments, either lacking security-specific annotations or being too limited in scale to support large-scale research. To bridge this gap, we introduce \textbf{SeRe}, a \textbf{security-related code review dataset}, constructed using an active learning-based ensemble classification approach. The proposed approach iteratively refines model predictions through human annotations, achieving high precision while maintaining reasonable recall. Using the fine-tuned ensemble classifier, we extracted 6,732 security-related reviews from 373,824 raw review instances, ensuring representativeness across multiple programming languages. Statistical analysis indicates that SeRe generally \textbf{aligns with real-world security-related review distribution}. To assess both the utility of SeRe and the effectiveness of existing code review comment generation approaches, we benchmark state-of-the-art approaches on security-related feedback generation. By releasing SeRe along with our benchmark results, we aim to advance research in automated security-focused code review and contribute to the development of more effective secure software engineering practices.

Related papers

From Detection to Prevention: Explaining Security-Critical Code to Avoid Vulnerabilities [2.490168997159702]
This work explores a proactive strategy to prevent vulnerabilities by highlighting code regions that implement security-critical functionality.<n>We present an IntelliJ IDEA plugin prototype that uses code-level software metrics to identify potentially security-critical methods.
arXiv Detail & Related papers (2026-01-31T13:16:01Z)
SafeRBench: A Comprehensive Benchmark for Safety Assessment in Large Reasoning Models [60.8821834954637]
We present SafeRBench, the first benchmark that assesses LRM safety end-to-end.<n>We pioneer the incorporation of risk categories and levels into input design.<n>We introduce a micro-thought chunking mechanism to segment long reasoning traces into semantically coherent units.
arXiv Detail & Related papers (2025-11-19T06:46:33Z)
SecureReviewer: Enhancing Large Language Models for Secure Code Review through Secure-aware Fine-tuning [8.229920162000369]
We propose SecureReviewer to identify and resolve security-related issues during code review.<n>We first construct a dataset tailored for training and evaluating secure code review capabilities.<n>We integrate the RAG technique, which grounds the generated comments in domain-specific security knowledge.
arXiv Detail & Related papers (2025-10-30T13:06:11Z)
iCodeReviewer: Improving Secure Code Review with Mixture of Prompts [5.322602557660654]
iCodeReviewer is an automated secure code review approach based on large language models (LLMs)<n>Experiment results demonstrate the effectiveness of iCodeReviewer in security issue identification and localization with an F1 of 63.98%.<n>The review comments generated by iCodeReviewer also achieve a high acceptance rate up to 84% when it is deployed in production environments.
arXiv Detail & Related papers (2025-10-14T06:30:59Z)
GitHub's Copilot Code Review: Can AI Spot Security Flaws Before You Commit? [0.0]
This study evaluates the effectiveness of GitHub Copilot's recently introduced code review feature in detecting security vulnerabilities.<n>Contrary to expectations, our results reveal that Copilot's code review frequently fails to detect critical vulnerabilities.<n>Our results highlight the continued necessity of dedicated security tools and manual code audits to ensure robust software security.
arXiv Detail & Related papers (2025-09-17T02:56:21Z)
CARE: Decoding Time Safety Alignment via Rollback and Introspection Intervention [68.95008546581339]
Existing decoding-time interventions, such as Contrastive Decoding, often force a severe trade-off between safety and response quality.<n>We propose CARE, a novel framework for decoding-time safety alignment that integrates three key components.<n>The framework achieves a superior balance of safety, quality, and efficiency, attaining a low harmful response rate and minimal disruption to the user experience.
arXiv Detail & Related papers (2025-09-01T04:50:02Z)
CoCoNUTS: Concentrating on Content while Neglecting Uninformative Textual Styles for AI-Generated Peer Review Detection [60.52240468810558]
We introduce CoCoNUTS, a content-oriented benchmark built upon a fine-grained dataset of AI-generated peer reviews.<n>We also develop CoCoDet, an AI review detector via a multi-task learning framework, to achieve more accurate and robust detection of AI involvement in review content.
arXiv Detail & Related papers (2025-08-28T06:03:11Z)
DATABench: Evaluating Dataset Auditing in Deep Learning from an Adversarial Perspective [70.77570343385928]
We introduce a novel taxonomy, classifying existing methods based on their reliance on internal features (IF) (inherent to the data) versus external features (EF) (artificially introduced for auditing)<n>We formulate two primary attack types: evasion attacks, designed to conceal the use of a dataset, and forgery attacks, intending to falsely implicate an unused dataset.<n>Building on the understanding of existing methods and attack objectives, we further propose systematic attack strategies: decoupling, removal, and detection for evasion; adversarial example-based methods for forgery.<n>Our benchmark, DATABench, comprises 17 evasion attacks, 5 forgery attacks, and 9
arXiv Detail & Related papers (2025-07-08T03:07:15Z)
Safety by Measurement: A Systematic Literature Review of AI Safety Evaluation Methods [0.0]
This literature review consolidates the rapidly evolving field of AI safety evaluations.<n>It proposes a systematic taxonomy around three dimensions: what properties we measure, how we measure them, and how these measurements integrate into frameworks.
arXiv Detail & Related papers (2025-05-08T16:55:07Z)
Improving Automated Secure Code Reviews: A Synthetic Dataset for Code Vulnerability Flaws [0.0]
We propose the creation of a synthetic dataset consisting of vulnerability-focused reviews that specifically comment on security flaws.<n>Our approach leverages Large Language Models (LLMs) to generate human-like code review comments for vulnerabilities.
arXiv Detail & Related papers (2025-04-22T23:07:24Z)
Advancing Embodied Agent Security: From Safety Benchmarks to Input Moderation [52.83870601473094]
Embodied agents exhibit immense potential across a multitude of domains.<n>Existing research predominantly concentrates on the security of general large language models.<n>This paper introduces a novel input moderation framework, meticulously designed to safeguard embodied agents.
arXiv Detail & Related papers (2025-04-22T08:34:35Z)
LLM-Safety Evaluations Lack Robustness [58.334290876531036]
We argue that current safety alignment research efforts for large language models are hindered by many intertwined sources of noise.<n>We propose a set of guidelines for reducing noise and bias in evaluations of future attack and defense papers.
arXiv Detail & Related papers (2025-03-04T12:55:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.