Related papers: A Comparative Study of Software Secrets Reporting by Secret Detection Tools

A Comparative Study of Software Secrets Reporting by Secret Detection Tools

URL: http://arxiv.org/abs/2307.00714v1
Date: Mon, 3 Jul 2023 02:32:09 GMT
Title: A Comparative Study of Software Secrets Reporting by Secret Detection Tools
Authors: Setu Kumar Basak, Jamison Cox, Bradley Reaves and Laurie Williams
Abstract summary: According to GitGuardian's monitoring of public GitHub repositories, secrets continued accelerating in 2022 by 67% compared to 2021. We present an evaluation of five open-source and four proprietary tools against a benchmark dataset. The top three tools based on precision are: GitHub Secret Scanner (75%), Gitleaks (46%), and Commercial X (25%), and based on recall are: Gitleaks (88%), SpectralOps (67%) and TruffleHog (52%)
Score: 5.9347272469695245
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Background: According to GitGuardian's monitoring of public GitHub repositories, secrets sprawl continued accelerating in 2022 by 67% compared to 2021, exposing over 10 million secrets (API keys and other credentials). Though many open-source and proprietary secret detection tools are available, these tools output many false positives, making it difficult for developers to take action and teams to choose one tool out of many. To our knowledge, the secret detection tools are not yet compared and evaluated. Aims: The goal of our study is to aid developers in choosing a secret detection tool to reduce the exposure of secrets through an empirical investigation of existing secret detection tools. Method: We present an evaluation of five open-source and four proprietary tools against a benchmark dataset. Results: The top three tools based on precision are: GitHub Secret Scanner (75%), Gitleaks (46%), and Commercial X (25%), and based on recall are: Gitleaks (88%), SpectralOps (67%) and TruffleHog (52%). Our manual analysis of reported secrets reveals that false positives are due to employing generic regular expressions and ineffective entropy calculation. In contrast, false negatives are due to faulty regular expressions, skipping specific file types, and insufficient rulesets. Conclusions: We recommend developers choose tools based on secret types present in their projects to prevent missing secrets. In addition, we recommend tool vendors update detection rules periodically and correctly employ secret verification mechanisms by collaborating with API vendors to improve accuracy.

Related papers

RiskHarvester: A Risk-based Tool to Prioritize Secret Removal Efforts in Software Artifacts [5.432601851190413]
Since 2020, GitGuardian has been detecting checked-in hard-coded secrets in GitHub repositories. During 2020-2023, GitGuardian has observed a four-fold increase in hard-coded secrets, with 12.8 million exposed in 2023. We present RiskHarvester, a risk-based tool to compute a security risk score based on the value of the asset and ease of attack on a database.
arXiv Detail & Related papers (2025-02-03T03:32:12Z)
Automatically Detecting Checked-In Secrets in Android Apps: How Far Are We? [4.619114660081147]
Developers often overlook the proper storage of such secrets, opting to put them directly into their projects. Checked-in secrets are checked into the projects and can be easily extracted and exploited by malicious adversaries. Unlike open-source projects, the lack of direct access to the source code and the presence of obfuscation complicates the checked-in secret detection for Android apps.
arXiv Detail & Related papers (2024-12-14T18:14:25Z)
Secret Breach Prevention in Software Issue Reports [2.8747015994080285]
This paper presents a novel technique for secret breach detection in software issue reports. We highlight the challenges posed by noise, such as log files, URLs, commit IDs, stack traces, and dummy passwords. We propose an approach combining the strengths of state-of-the-artes with the contextual understanding of language models.
arXiv Detail & Related papers (2024-10-31T06:14:17Z)
Understanding Code Understandability Improvements in Code Reviews [79.16476505761582]
We analyzed 2,401 code review comments from Java open-source projects on GitHub. 83.9% of suggestions for improvement were accepted and integrated, with fewer than 1% later reverted.
arXiv Detail & Related papers (2024-10-29T12:21:23Z)
Open-CD: A Comprehensive Toolbox for Change Detection [59.79011759027916]
Open-CD is a change detection toolbox that contains a rich set of change detection methods as well as related components and modules. It gradually evolves into a unified platform that covers many popular change detection methods and contemporary modules.
arXiv Detail & Related papers (2024-07-22T01:04:16Z)
Static Application Security Testing (SAST) Tools for Smart Contracts: How Far Are We? [14.974832502863526]
In recent years, the importance of smart contract security has been heightened by the increasing number of attacks against them. To address this issue, a multitude of static application security testing (SAST) tools have been proposed for detecting vulnerabilities in smart contracts. In this paper, we propose an up-to-date and fine-grained taxonomy that includes 45 unique vulnerability types for smart contracts.
arXiv Detail & Related papers (2024-04-28T13:40:18Z)
AssetHarvester: A Static Analysis Tool for Detecting Secret-Asset Pairs in Software Artifacts [4.778835435164734]
We present AssetHarvester, a static analysis tool to detect secret-asset pairs in a repository. We curated a benchmark of 1,791 secret-asset pairs of four database types extracted from 188 public repositories to evaluate the performance of AssetHarvester. Our findings indicate that data flow analysis employed in AssetHarvester detects secret-asset pairs with 0% false positives and aids in improving recall of secret detection tools.
arXiv Detail & Related papers (2024-03-28T00:24:49Z)
A Systematic Evaluation of Automated Tools for Side-Channel Vulnerabilities Detection in Cryptographic Libraries [6.826526973994114]
We surveyed the literature to build a classification of 34 side-channel detection frameworks. We then built a benchmark of representative cryptographic operations on a selection of 5 promising detection tools. We offer a classification of recently published side-channel vulnerabilities. We find that existing tools can struggle to find vulnerabilities for a variety of reasons, mainly the lack of support for SIMD instructions, implicit flows, and internal secret generation.
arXiv Detail & Related papers (2023-10-12T09:18:26Z)
Zero-Shot Detection of Machine-Generated Codes [83.0342513054389]
This work proposes a training-free approach for the detection of LLMs-generated codes. We find that existing training-based or zero-shot text detectors are ineffective in detecting code. Our method exhibits robustness against revision attacks and generalizes well to Java codes.
arXiv Detail & Related papers (2023-10-08T10:08:21Z)
On the Security Blind Spots of Software Composition Analysis [46.1389163921338]
We present a novel approach to detect vulnerable clones in the Maven repository. We retrieve over 53k potential vulnerable clones from Maven Central. We detect 727 confirmed vulnerable clones and synthesize a testable proof-of-vulnerability project for each of those.
arXiv Detail & Related papers (2023-06-08T20:14:46Z)
VulCurator: A Vulnerability-Fixing Commit Detector [8.32137934421055]
VulCurator is a tool that leverages deep learning on richer sources of information. VulCurator outperforms the state-of-the-art baselines up to 16.1% in terms of F1-score.
arXiv Detail & Related papers (2022-09-07T16:11:31Z)
Black-box Dataset Ownership Verification via Backdoor Watermarking [67.69308278379957]
We formulate the protection of released datasets as verifying whether they are adopted for training a (suspicious) third-party model. We propose to embed external patterns via backdoor watermarking for the ownership verification to protect them. Specifically, we exploit poison-only backdoor attacks ($e.g.$, BadNets) for dataset watermarking and design a hypothesis-test-guided method for dataset verification.
arXiv Detail & Related papers (2022-08-04T05:32:20Z)
Open-sourced Dataset Protection via Backdoor Watermarking [87.15630326131901]
We propose a emphbackdoor embedding based dataset watermarking method to protect an open-sourced image-classification dataset. We use a hypothesis test guided method for dataset verification based on the posterior probability generated by the suspicious third-party model.
arXiv Detail & Related papers (2020-10-12T16:16:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.