RiskHarvester: A Risk-based Tool to Prioritize Secret Removal Efforts in Software Artifacts
- URL: http://arxiv.org/abs/2502.01020v1
- Date: Mon, 03 Feb 2025 03:32:12 GMT
- Title: RiskHarvester: A Risk-based Tool to Prioritize Secret Removal Efforts in Software Artifacts
- Authors: Setu Kumar Basak, Tanmay Pardeshi, Bradley Reaves, Laurie Williams,
- Abstract summary: Since 2020, GitGuardian has been detecting checked-in hard-coded secrets in GitHub repositories.
During 2020-2023, GitGuardian has observed a four-fold increase in hard-coded secrets, with 12.8 million exposed in 2023.
We present RiskHarvester, a risk-based tool to compute a security risk score based on the value of the asset and ease of attack on a database.
- Score: 5.432601851190413
- License:
- Abstract: Since 2020, GitGuardian has been detecting checked-in hard-coded secrets in GitHub repositories. During 2020-2023, GitGuardian has observed an upward annual trend and a four-fold increase in hard-coded secrets, with 12.8 million exposed in 2023. However, removing all the secrets from software artifacts is not feasible due to time constraints and technical challenges. Additionally, the security risks of the secrets are not equal, protecting assets ranging from obsolete databases to sensitive medical data. Thus, secret removal should be prioritized by security risk reduction, which existing secret detection tools do not support. The goal of this research is to aid software practitioners in prioritizing secrets removal efforts through our security risk-based tool. We present RiskHarvester, a risk-based tool to compute a security risk score based on the value of the asset and ease of attack on a database. We calculated the value of asset by identifying the sensitive data categories present in a database from the database keywords in the source code. We utilized data flow analysis, SQL, and ORM parsing to identify the database keywords. To calculate the ease of attack, we utilized passive network analysis to retrieve the database host information. To evaluate RiskHarvester, we curated RiskBench, a benchmark of 1,791 database secret-asset pairs with sensitive data categories and host information manually retrieved from 188 GitHub repositories. RiskHarvester demonstrates precision of (95%) and recall (90%) in detecting database keywords for the value of asset and precision of (96%) and recall (94%) in detecting valid hosts for ease of attack. Finally, we conducted a survey (52 respondents) to understand whether developers prioritize secret removal based on security risk score. We found that 86% of the developers prioritized the secrets for removal with descending security risk scores.
Related papers
- On Categorizing Open Source Software Security Vulnerability Reporting Mechanisms on GitHub [1.7174932174564534]
Open-source projects are essential to software development, but publicly disclosing vulnerabilities without fixes increases the risk of exploitation.
The Open Source Security Foundation (OpenSSF) addresses this issue by promoting robust security policies to enhance project security.
Current research reveals that many projects perform poorly on OpenSSF criteria, indicating a need for stronger security practices.
arXiv Detail & Related papers (2025-02-11T09:23:24Z) - Investigating Vulnerability Disclosures in Open-Source Software Using Bug Bounty Reports and Security Advisories [6.814841205623832]
We conduct an empirical study on 3,798 reviewed GitHub security advisories and 4,033 disclosed OSS bug bounty reports.
We are the first to determine the explicit process describing how OSS vulnerabilities propagate from security advisories and bug bounty reports.
arXiv Detail & Related papers (2025-01-29T16:36:41Z) - Secret Breach Prevention in Software Issue Reports [2.8747015994080285]
This paper presents a novel technique for secret breach detection in software issue reports.
We highlight the challenges posed by noise, such as log files, URLs, commit IDs, stack traces, and dummy passwords.
We propose an approach combining the strengths of state-of-the-artes with the contextual understanding of language models.
arXiv Detail & Related papers (2024-10-31T06:14:17Z) - "Glue pizza and eat rocks" -- Exploiting Vulnerabilities in Retrieval-Augmented Generative Models [74.05368440735468]
Retrieval-Augmented Generative (RAG) models enhance Large Language Models (LLMs)
In this paper, we demonstrate a security threat where adversaries can exploit the openness of these knowledge bases.
arXiv Detail & Related papers (2024-06-26T05:36:23Z) - Rethinking the Vulnerabilities of Face Recognition Systems:From a Practical Perspective [53.24281798458074]
Face Recognition Systems (FRS) have increasingly integrated into critical applications, including surveillance and user authentication.
Recent studies have revealed vulnerabilities in FRS to adversarial (e.g., adversarial patch attacks) and backdoor attacks (e.g., training data poisoning)
arXiv Detail & Related papers (2024-05-21T13:34:23Z) - AssetHarvester: A Static Analysis Tool for Detecting Secret-Asset Pairs in Software Artifacts [4.778835435164734]
We present AssetHarvester, a static analysis tool to detect secret-asset pairs in a repository.
We curated a benchmark of 1,791 secret-asset pairs of four database types extracted from 188 public repositories to evaluate the performance of AssetHarvester.
Our findings indicate that data flow analysis employed in AssetHarvester detects secret-asset pairs with 0% false positives and aids in improving recall of secret detection tools.
arXiv Detail & Related papers (2024-03-28T00:24:49Z) - Vulnerability Scanners for Ethereum Smart Contracts: A Large-Scale Study [44.25093111430751]
In 2023 alone, such vulnerabilities led to substantial financial losses exceeding a billion of US dollars.
Various tools have been developed to detect and mitigate vulnerabilities in smart contracts.
This study investigates the gap between the effectiveness of existing security scanners and the vulnerabilities that still persist in practice.
arXiv Detail & Related papers (2023-12-27T11:26:26Z) - A Comparative Study of Software Secrets Reporting by Secret Detection
Tools [5.9347272469695245]
According to GitGuardian's monitoring of public GitHub repositories, secrets continued accelerating in 2022 by 67% compared to 2021.
We present an evaluation of five open-source and four proprietary tools against a benchmark dataset.
The top three tools based on precision are: GitHub Secret Scanner (75%), Gitleaks (46%), and Commercial X (25%), and based on recall are: Gitleaks (88%), SpectralOps (67%) and TruffleHog (52%)
arXiv Detail & Related papers (2023-07-03T02:32:09Z) - Backdoor Attacks Against Incremental Learners: An Empirical Evaluation
Study [79.33449311057088]
This paper empirically reveals the high vulnerability of 11 typical incremental learners against poisoning-based backdoor attack on 3 learning scenarios.
The defense mechanism based on activation clustering is found to be effective in detecting our trigger pattern to mitigate potential security risks.
arXiv Detail & Related papers (2023-05-28T09:17:48Z) - Pre-trained Encoders in Self-Supervised Learning Improve Secure and
Privacy-preserving Supervised Learning [63.45532264721498]
Self-supervised learning is an emerging technique to pre-train encoders using unlabeled data.
We perform first systematic, principled measurement study to understand whether and when a pretrained encoder can address the limitations of secure or privacy-preserving supervised learning algorithms.
arXiv Detail & Related papers (2022-12-06T21:35:35Z) - Dos and Don'ts of Machine Learning in Computer Security [74.1816306998445]
Despite great potential, machine learning in security is prone to subtle pitfalls that undermine its performance.
We identify common pitfalls in the design, implementation, and evaluation of learning-based security systems.
We propose actionable recommendations to support researchers in avoiding or mitigating the pitfalls where possible.
arXiv Detail & Related papers (2020-10-19T13:09:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.