AssetHarvester: A Static Analysis Tool for Detecting Secret-Asset Pairs in Software Artifacts
- URL: http://arxiv.org/abs/2403.19072v2
- Date: Wed, 20 Nov 2024 06:06:15 GMT
- Title: AssetHarvester: A Static Analysis Tool for Detecting Secret-Asset Pairs in Software Artifacts
- Authors: Setu Kumar Basak, K. Virgil English, Ken Ogura, Vitesh Kambara, Bradley Reaves, Laurie Williams,
- Abstract summary: We present AssetHarvester, a static analysis tool to detect secret-asset pairs in a repository.
We curated a benchmark of 1,791 secret-asset pairs of four database types extracted from 188 public repositories to evaluate the performance of AssetHarvester.
Our findings indicate that data flow analysis employed in AssetHarvester detects secret-asset pairs with 0% false positives and aids in improving recall of secret detection tools.
- Score: 4.778835435164734
- License:
- Abstract: GitGuardian monitored secrets exposure in public GitHub repositories and reported that developers leaked over 12 million secrets (database and other credentials) in 2023, indicating a 113% surge from 2021. Despite the availability of secret detection tools, developers ignore the tools' reported warnings because of false positives (25%-99%). However, each secret protects assets of different values accessible through asset identifiers (a DNS name and a public or private IP address). The asset information for a secret can aid developers in filtering false positives and prioritizing secret removal from the source code. However, existing secret detection tools do not provide the asset information, thus presenting difficulty to developers in filtering secrets only by looking at the secret value or finding the assets manually for each reported secret. The goal of our study is to aid software practitioners in prioritizing secrets removal by providing the assets information protected by the secrets through our novel static analysis tool. We present AssetHarvester, a static analysis tool to detect secret-asset pairs in a repository. Since the location of the asset can be distant from where the secret is defined, we investigated secret-asset co-location patterns and found four patterns. To identify the secret-asset pairs of the four patterns, we utilized three approaches (pattern matching, data flow analysis, and fast-approximation heuristics). We curated a benchmark of 1,791 secret-asset pairs of four database types extracted from 188 public GitHub repositories to evaluate the performance of AssetHarvester. AssetHarvester demonstrates precision of (97%), recall (90%), and F1-score (94%) in detecting secret-asset pairs. Our findings indicate that data flow analysis employed in AssetHarvester detects secret-asset pairs with 0% false positives and aids in improving recall of secret detection tools.
Related papers
- RiskHarvester: A Risk-based Tool to Prioritize Secret Removal Efforts in Software Artifacts [5.432601851190413]
Since 2020, GitGuardian has been detecting checked-in hard-coded secrets in GitHub repositories.
During 2020-2023, GitGuardian has observed a four-fold increase in hard-coded secrets, with 12.8 million exposed in 2023.
We present RiskHarvester, a risk-based tool to compute a security risk score based on the value of the asset and ease of attack on a database.
arXiv Detail & Related papers (2025-02-03T03:32:12Z) - Automatically Detecting Checked-In Secrets in Android Apps: How Far Are We? [4.619114660081147]
Developers often overlook the proper storage of such secrets, opting to put them directly into their projects.
Checked-in secrets are checked into the projects and can be easily extracted and exploited by malicious adversaries.
Unlike open-source projects, the lack of direct access to the source code and the presence of obfuscation complicates the checked-in secret detection for Android apps.
arXiv Detail & Related papers (2024-12-14T18:14:25Z) - Secret Breach Prevention in Software Issue Reports [2.8747015994080285]
This paper presents a novel technique for secret breach detection in software issue reports.
We highlight the challenges posed by noise, such as log files, URLs, commit IDs, stack traces, and dummy passwords.
We propose an approach combining the strengths of state-of-the-artes with the contextual understanding of language models.
arXiv Detail & Related papers (2024-10-31T06:14:17Z) - Bayesian Detector Combination for Object Detection with Crowdsourced Annotations [49.43709660948812]
Acquiring fine-grained object detection annotations in unconstrained images is time-consuming, expensive, and prone to noise.
We propose a novel Bayesian Detector Combination (BDC) framework to more effectively train object detectors with noisy crowdsourced annotations.
BDC is model-agnostic, requires no prior knowledge of the annotators' skill level, and seamlessly integrates with existing object detection models.
arXiv Detail & Related papers (2024-07-10T18:00:54Z) - Boosting Static Resource Leak Detection via LLM-based Resource-Oriented Intention Inference [14.783216988363804]
Existing static detection techniques rely on mechanical matching of predefined resource acquisition/release APIs and null-checking conditions to find unreleased resources.
We propose InferROI, a novel approach that directly infers resource-oriented intentions (acquisition, release, and reachability validation) in code.
We evaluate the effectiveness of InferROI in both resource-oriented intention inference and resource leak detection.
arXiv Detail & Related papers (2023-11-08T04:19:28Z) - A Comparative Study of Software Secrets Reporting by Secret Detection
Tools [5.9347272469695245]
According to GitGuardian's monitoring of public GitHub repositories, secrets continued accelerating in 2022 by 67% compared to 2021.
We present an evaluation of five open-source and four proprietary tools against a benchmark dataset.
The top three tools based on precision are: GitHub Secret Scanner (75%), Gitleaks (46%), and Commercial X (25%), and based on recall are: Gitleaks (88%), SpectralOps (67%) and TruffleHog (52%)
arXiv Detail & Related papers (2023-07-03T02:32:09Z) - SalienDet: A Saliency-based Feature Enhancement Algorithm for Object
Detection for Autonomous Driving [160.57870373052577]
We propose a saliency-based OD algorithm (SalienDet) to detect unknown objects.
Our SalienDet utilizes a saliency-based algorithm to enhance image features for object proposal generation.
We design a dataset relabeling approach to differentiate the unknown objects from all objects in training sample set to achieve Open-World Detection.
arXiv Detail & Related papers (2023-05-11T16:19:44Z) - Hiding Images in Deep Probabilistic Models [58.23127414572098]
We describe a different computational framework to hide images in deep probabilistic models.
Specifically, we use a DNN to model the probability density of cover images, and hide a secret image in one particular location of the learned distribution.
We demonstrate the feasibility of our SinGAN approach in terms of extraction accuracy and model security.
arXiv Detail & Related papers (2022-10-05T13:33:25Z) - Untargeted Backdoor Watermark: Towards Harmless and Stealthy Dataset
Copyright Protection [69.59980270078067]
We explore the untargeted backdoor watermarking scheme, where the abnormal model behaviors are not deterministic.
We also discuss how to use the proposed untargeted backdoor watermark for dataset ownership verification.
arXiv Detail & Related papers (2022-09-27T12:56:56Z) - Black-box Dataset Ownership Verification via Backdoor Watermarking [67.69308278379957]
We formulate the protection of released datasets as verifying whether they are adopted for training a (suspicious) third-party model.
We propose to embed external patterns via backdoor watermarking for the ownership verification to protect them.
Specifically, we exploit poison-only backdoor attacks ($e.g.$, BadNets) for dataset watermarking and design a hypothesis-test-guided method for dataset verification.
arXiv Detail & Related papers (2022-08-04T05:32:20Z) - Open-sourced Dataset Protection via Backdoor Watermarking [87.15630326131901]
We propose a emphbackdoor embedding based dataset watermarking method to protect an open-sourced image-classification dataset.
We use a hypothesis test guided method for dataset verification based on the posterior probability generated by the suspicious third-party model.
arXiv Detail & Related papers (2020-10-12T16:16:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.