Related papers: ORCA: Unveiling Obscure Containers In The Wild

ORCA: Unveiling Obscure Containers In The Wild

URL: http://arxiv.org/abs/2509.09322v1
Date: Thu, 11 Sep 2025 10:12:56 GMT
Title: ORCA: Unveiling Obscure Containers In The Wild
Authors: Jacopo Bufalino, Agathe Blaise, Stefano Secci,
Abstract summary: Software Composition Analysis (SCA) is a critical process that helps identify packages and dependencies inside a container.<n>In this paper, we examine the limitations of both cloud-based and open-source SCA tools when faced with such obscure images.<n>We propose an obscuration-resilient methodology for container analysis and introduce ORCA, its open-source implementation.
Score: 2.412902381004722
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Modern software development increasingly depends on open-source libraries and third-party components, which are often encapsulated into containerized environments. While improving the development and deployment of applications, this approach introduces security risks, particularly when outdated or vulnerable components are inadvertently included in production environments. Software Composition Analysis (SCA) is a critical process that helps identify and manage packages and dependencies inside a container. However, unintentional modifications to the container filesystem can lead to incomplete container images, which compromise the reliability of SCA tools. In this paper, we examine the limitations of both cloud-based and open-source SCA tools when faced with such obscure images. An analysis of 600 popular containers revealed that obscure containers exist in well-known registries and trusted images and that many tools fail to analyze such containers. To mitigate these issues, we propose an obscuration-resilient methodology for container analysis and introduce ORCA (Obscuration-Resilient Container Analyzer), its open-source implementation. We reported our findings to all vendors using their appropriate channels. Our results demonstrate that ORCA effectively detects the content of obscure containers and achieves a median 40% improvement in file coverage compared to Docker Scout and Syft.

Related papers

A Large Scale Empirical Analysis on the Adherence Gap between Standards and Tools in SBOM [54.38424417079265]
A Software Bill of Materials (SBOM) is a machine-readable artifact that organizes software information.<n>Following standards, organizations have developed tools for generating and utilizing SBOMs.<n>This paper presents the first large-scale, two-stage empirical analysis of the adherence gap, using our automated evaluation framework, SAP.
arXiv Detail & Related papers (2026-01-09T08:26:05Z)
A Systematic Mapping Study on Risks and Vulnerabilities in Software Containers [1.3099514901173375]
Software containers are widely adopted for developing and deploying software applications.<n>Despite their popularity, major security concerns arise during container development and deployment.<n>This study offers critical insights into the current landscape of security issues within software container systems.
arXiv Detail & Related papers (2025-12-12T10:22:34Z)
Package Dashboard: A Cross-Ecosystem Framework for Dual-Perspective Analysis of Software Packages [10.345664674440139]
Package Dashboard is a cross-ecosystem framework that provides a unified platform for supply chain analysis.<n>By combining dependency resolution with repository analysis, it reduces cognitive load and improves traceability.
arXiv Detail & Related papers (2025-12-01T12:52:03Z)
Trace: Securing Smart Contract Repository Against Access Control Vulnerability [58.02691083789239]
GitHub hosts numerous smart contract repositories containing source code, documentation, and configuration files.<n>Third-party developers often reference, reuse, or fork code from these repositories during custom development.<n>Existing tools for detecting smart contract vulnerabilities are limited in their ability to handle complex repositories.
arXiv Detail & Related papers (2025-10-22T05:18:28Z)
Automated Vulnerability Validation and Verification: A Large Language Model Approach [7.482522010482827]
This paper introduces an end-to-end multi-step pipeline leveraging generative AI, specifically large language models (LLMs)<n>Our approach extracts information from CVE disclosures in the National Vulnerability Database.<n>It augments it with external public knowledge (e.g., threat advisories, code snippets) using Retrieval-Augmented Generation (RAG)<n>The pipeline iteratively refines generated artifacts, validates attack success with test cases, and supports complex multi-container setups.
arXiv Detail & Related papers (2025-09-28T19:16:12Z)
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code [48.10068691540979]
A.S.E (AI Code Generation Security Evaluation) is a repository-level evaluation benchmark designed to closely mirror real-world AI programming tasks.<n>Our evaluation of leading large language models (LLMs) on A.S.E reveals several key findings.
arXiv Detail & Related papers (2025-08-25T15:11:11Z)
Malware Detection in Docker Containers: An Image is Worth a Thousand Logs [10.132362061193954]
We introduce a method to identify compromised containers through machine learning analysis of their file systems.<n>We cast the entire software containers into large RGB images via their tarball representations, and propose to use established Convolutional Neural Network architectures on a streaming, patch-based manner.<n>Our method detects more malware and achieves higher F1 and Recall scores than all individual and ensembles of VirusTotal engines, demonstrating its effectiveness and setting a new standard for identifying malware-compromised software containers.
arXiv Detail & Related papers (2025-04-04T07:38:16Z)
Vexed by VEX tools: Consistency evaluation of container vulnerability scanners [0.0]
This paper presents a study that analyzed state-of-the-art vulnerability scanning tools applied to containers.<n>We have focused the work on tools following the Vulnerability Exploitability eXchange (VEX) format.
arXiv Detail & Related papers (2025-03-18T16:22:43Z)
Bomfather: An eBPF-based Kernel-level Monitoring Framework for Accurate Identification of Unknown, Unused, and Dynamically Loaded Dependencies in Modern Software Supply Chains [0.0]
Inaccuracies in dependency-tracking methods undermine the security and integrity of modern software supply chains.<n>This paper introduces a kernel-level framework leveraging extended Berkeley Packet Filter (eBPF) to capture software build dependencies transparently in real time.
arXiv Detail & Related papers (2025-03-03T22:32:59Z)
A Machine Learning-Based Approach For Detecting Malicious PyPI Packages [4.311626046942916]
In modern software development, the use of external libraries and packages is increasingly prevalent.<n>This reliance on reusing code introduces serious risks for deployed software in the form of malicious packages.<n>We propose a data-driven approach that uses machine learning and static analysis to examine the package's metadata, code, files, and textual characteristics.
arXiv Detail & Related papers (2024-12-06T18:49:06Z)
The Impact of SBOM Generators on Vulnerability Assessment in Python: A Comparison and a Novel Approach [56.4040698609393]
Software Bill of Materials (SBOM) has been promoted as a tool to increase transparency and verifiability in software composition. Current SBOM generation tools often suffer from inaccuracies in identifying components and dependencies. We propose PIP-sbom, a novel pip-inspired solution that addresses their shortcomings.
arXiv Detail & Related papers (2024-09-10T10:12:37Z)
"Glue pizza and eat rocks" -- Exploiting Vulnerabilities in Retrieval-Augmented Generative Models [74.05368440735468]
Retrieval-Augmented Generative (RAG) models enhance Large Language Models (LLMs) In this paper, we demonstrate a security threat where adversaries can exploit the openness of these knowledge bases.
arXiv Detail & Related papers (2024-06-26T05:36:23Z)
Alibaba LingmaAgent: Improving Automated Issue Resolution via Comprehensive Repository Exploration [64.19431011897515]
This paper presents Alibaba LingmaAgent, a novel Automated Software Engineering method designed to comprehensively understand and utilize whole software repositories for issue resolution.<n>Our approach introduces a top-down method to condense critical repository information into a knowledge graph, reducing complexity, and employs a Monte Carlo tree search based strategy.<n>In production deployment and evaluation at Alibaba Cloud, LingmaAgent automatically resolved 16.9% of in-house issues faced by development engineers, and solved 43.3% of problems after manual intervention.
arXiv Detail & Related papers (2024-06-03T15:20:06Z)
LUCID: A Framework for Reducing False Positives and Inconsistencies Among Container Scanning Tools [0.0]
This paper provides a fully functional framework named LUCID that can reduce false positives and inconsistencies provided by multiple scanning tools. Our results show that our framework can reduce inconsistencies by 70%. We also create a Dynamic Classification component that can successfully classify and predict the different severity levels with an accuracy of 84%.
arXiv Detail & Related papers (2024-05-11T16:58:28Z)
Securing the Open RAN Infrastructure: Exploring Vulnerabilities in Kubernetes Deployments [60.51751612363882]
We investigate the security implications of and software-based Open Radio Access Network (RAN) systems. We highlight the presence of potential vulnerabilities and misconfigurations in the infrastructure supporting the Near Real-Time RAN Controller (RIC) cluster.
arXiv Detail & Related papers (2024-05-03T07:18:45Z)
On the Security Blind Spots of Software Composition Analysis [46.1389163921338]
We present a novel approach to detect vulnerable clones in the Maven repository. We retrieve over 53k potential vulnerable clones from Maven Central. We detect 727 confirmed vulnerable clones and synthesize a testable proof-of-vulnerability project for each of those.
arXiv Detail & Related papers (2023-06-08T20:14:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.