Related papers: Toward Effective Secure Code Reviews: An Empirical Study of Security-Related Coding Weaknesses

Toward Effective Secure Code Reviews: An Empirical Study of Security-Related Coding Weaknesses

URL: http://arxiv.org/abs/2311.16396v2
Date: Thu, 9 May 2024 01:43:50 GMT
Title: Toward Effective Secure Code Reviews: An Empirical Study of Security-Related Coding Weaknesses
Authors: Wachiraphan Charoenwet, Patanamon Thongtanunam, Van-Thuan Pham, Christoph Treude,
Abstract summary: We conducted an empirical case study in two large open-source projects, OpenSSL and PHP. Based on 135,560 code review comments, we found that reviewers raised security concerns in 35 out of 40 coding weakness categories. Some coding weaknesses related to past vulnerabilities, such as memory errors and resource management, were discussed less often than the vulnerabilities.
Score: 14.134803943492345
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Identifying security issues early is encouraged to reduce the latent negative impacts on software systems. Code review is a widely-used method that allows developers to manually inspect modified code, catching security issues during a software development cycle. However, existing code review studies often focus on known vulnerabilities, neglecting coding weaknesses, which can introduce real-world security issues that are more visible through code review. The practices of code reviews in identifying such coding weaknesses are not yet fully investigated. To better understand this, we conducted an empirical case study in two large open-source projects, OpenSSL and PHP. Based on 135,560 code review comments, we found that reviewers raised security concerns in 35 out of 40 coding weakness categories. Surprisingly, some coding weaknesses related to past vulnerabilities, such as memory errors and resource management, were discussed less often than the vulnerabilities. Developers attempted to address raised security concerns in many cases (39%-41%), but a substantial portion was merely acknowledged (30%-36%), and some went unfixed due to disagreements about solutions (18%-20%). This highlights that coding weaknesses can slip through code review even when identified. Our findings suggest that reviewers can identify various coding weaknesses leading to security issues during code reviews. However, these results also reveal shortcomings in current code review practices, indicating the need for more effective mechanisms or support for increasing awareness of security issue management in code reviews.

Related papers

CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models [97.18215355266143]
We introduce a holistic code critique benchmark for Large Language Models (LLMs) called CodeCriticBench. Specifically, our CodeCriticBench includes two mainstream code tasks (i.e., code generation and code QA) with different difficulties. Besides, the evaluation protocols include basic critique evaluation and advanced critique evaluation for different characteristics.
arXiv Detail & Related papers (2025-02-23T15:36:43Z)
RedCode: Risky Code Execution and Generation Benchmark for Code Agents [50.81206098588923]
RedCode is a benchmark for risky code execution and generation. RedCode-Exec provides challenging prompts that could lead to risky code execution. RedCode-Gen provides 160 prompts with function signatures and docstrings as input to assess whether code agents will follow instructions.
arXiv Detail & Related papers (2024-11-12T13:30:06Z)
Understanding Code Understandability Improvements in Code Reviews [79.16476505761582]
We analyzed 2,401 code review comments from Java open-source projects on GitHub. 83.9% of suggestions for improvement were accepted and integrated, with fewer than 1% later reverted.
arXiv Detail & Related papers (2024-10-29T12:21:23Z)
Unintentional Security Flaws in Code: Automated Defense via Root Cause Analysis [2.899501205987888]
We developed an automated vulnerability root cause (RC) toolkit called T5-RCGCN. It combines T5 language model embeddings with a graph convolutional network (GCN) for vulnerability classification and localization. We tested T5-RCGCN with 56 junior developers across three datasets, showing a 28.9% improvement in code security compared to previous methods.
arXiv Detail & Related papers (2024-08-30T18:26:59Z)
Is Your AI-Generated Code Really Safe? Evaluating Large Language Models on Secure Code Generation with CodeSecEval [20.959848710829878]
Large language models (LLMs) have brought significant advancements to code generation and code repair. However, their training using unsanitized data from open-source repositories, like GitHub, raises the risk of inadvertently propagating security vulnerabilities. We aim to present a comprehensive study aimed at precisely evaluating and enhancing the security aspects of code LLMs.
arXiv Detail & Related papers (2024-07-02T16:13:21Z)
CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion [117.178835165855]
This paper introduces CodeAttack, a framework that transforms natural language inputs into code inputs. Our studies reveal a new and universal safety vulnerability of these models against code input. We find that a larger distribution gap between CodeAttack and natural language leads to weaker safety generalization.
arXiv Detail & Related papers (2024-03-12T17:55:38Z)
LLM-Powered Code Vulnerability Repair with Reinforcement Learning and Semantic Reward [3.729516018513228]
We introduce a multipurpose code vulnerability analysis system textttSecRepair, powered by a large language model, CodeGen2. Inspired by how humans fix code issues, we propose an instruction-based dataset suitable for vulnerability analysis with LLMs. We identify zero-day and N-day vulnerabilities in 6 Open Source IoT Operating Systems on GitHub.
arXiv Detail & Related papers (2024-01-07T02:46:39Z)
Security Defect Detection via Code Review: A Study of the OpenStack and Qt Communities [7.2944322548786715]
Security defects are not prevalently discussed in code review. More than half of the reviewers provided explicit fixing strategies/solutions to help developers fix security defects. Disagreement between the developer and the reviewer are the main causes for not resolving security defects.
arXiv Detail & Related papers (2023-07-05T14:30:41Z)
CodeLMSec Benchmark: Systematically Evaluating and Finding Security Vulnerabilities in Black-Box Code Language Models [58.27254444280376]
Large language models (LLMs) for automatic code generation have achieved breakthroughs in several programming tasks. Training data for these models is usually collected from the Internet (e.g., from open-source repositories) and is likely to contain faults and security vulnerabilities. This unsanitized training data can cause the language models to learn these vulnerabilities and propagate them during the code generation procedure.
arXiv Detail & Related papers (2023-02-08T11:54:07Z)
Dos and Don'ts of Machine Learning in Computer Security [74.1816306998445]
Despite great potential, machine learning in security is prone to subtle pitfalls that undermine its performance. We identify common pitfalls in the design, implementation, and evaluation of learning-based security systems. We propose actionable recommendations to support researchers in avoiding or mitigating the pitfalls where possible.
arXiv Detail & Related papers (2020-10-19T13:09:31Z)
Deep Just-In-Time Inconsistency Detection Between Comments and Source Code [51.00904399653609]
In this paper, we aim to detect whether a comment becomes inconsistent as a result of changes to the corresponding body of code. We develop a deep-learning approach that learns to correlate a comment with code changes. We show the usefulness of our approach by combining it with a comment update model to build a more comprehensive automatic comment maintenance system.
arXiv Detail & Related papers (2020-10-04T16:49:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.