Unpacking Security Scanners for GitHub Actions Workflows
- URL: http://arxiv.org/abs/2601.14455v1
- Date: Tue, 20 Jan 2026 20:25:11 GMT
- Title: Unpacking Security Scanners for GitHub Actions Workflows
- Authors: Madjda Fares, Yogya Gamage, Benoit Baudry,
- Abstract summary: GitHub Actions is a widely used platform that allows developers to automate the build and deployment of their projects.<n>As the platform's popularity continues to grow, it has become a target of choice for recent software supply chain attacks.<n>Several security scanners have emerged to help developers harden their GitHub Actions.
- Score: 2.046588369793562
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: GitHub Actions is a widely used platform that allows developers to automate the build and deployment of their projects through configurable workflows. As the platform's popularity continues to grow, it has become a target of choice for recent software supply chain attacks. These attacks exploit excessive permissions, ambiguous versions, or the absence of artifact integrity checks to compromise workflows. In response to these attacks, several security scanners have emerged to help developers harden their workflows. In this paper, we perform the first systematic comparison of 9 GitHub Actions workflow security scanners. We compare them in terms of scope (which security weaknesses they target), detection capabilities (how many weaknesses they detect), and usability (how long they take to scan a workflow). To compare scanners on a common ground, we first establish a taxonomy of 10 security weaknesses that can occur in GitHub Actions workflows. Then, we run the scanners against a curated set of 596 workflows. Our study reveals that the landscape of GitHub Actions workflow security scanners is diverse, with both broad-scope tools and very focused ones. More importantly, we show that scanners interpret security weaknesses differently, leading to significant differences in the type and number of reported weaknesses. Based on this empirical evidence, we make actionable recommendations for developers to harden their GitHub Actions workflows.
Related papers
- A Trajectory-Based Safety Audit of Clawdbot (OpenClaw) [77.1549110891026]
We present a trajectory-centric evaluation of Clawdbot across six risk dimensions.<n>We log complete interaction trajectories (messages, actions, tool-call arguments/outputs) and assess safety using both an automated trajectory judge and human review.
arXiv Detail & Related papers (2026-02-16T00:33:02Z) - SkillJect: Automating Stealthy Skill-Based Prompt Injection for Coding Agents with Trace-Driven Closed-Loop Refinement [120.52289344734415]
We propose an automated framework for stealthy prompt injection tailored to agent skills.<n>The framework forms a closed loop with three agents: an Attack Agent that synthesizes injection skills under explicit stealth constraints, a Code Agent that executes tasks using the injected skills and an Evaluate Agent that logs action traces.<n>Our method consistently achieves high attack success rates under realistic settings.
arXiv Detail & Related papers (2026-02-15T16:09:48Z) - Deep Dive into the Abuse of DL APIs To Create Malicious AI Models and How to Detect Them [3.8802542855314788]
Pre-trained AI models are often fetched from model hubs like Hugging Face or Hub.<n>This introduces a security risk where attackers can inject malicious code into the models.<n>We show how one can abuse hidden functionalities of APIs such as file read/write and network send/receive.
arXiv Detail & Related papers (2026-01-08T03:30:20Z) - Granite: Granular Runtime Enforcement for GitHub Actions Permissions [2.278720757613755]
We present Granite, a proxy-based system that enforces fine-starred permissions for GitHub Actions at the step-level granularity within a job.<n>Our analysis reveals that 52.7% of the jobs can be protected by Granite against permission misuse attacks.
arXiv Detail & Related papers (2025-12-12T14:38:45Z) - RedCodeAgent: Automatic Red-teaming Agent against Diverse Code Agents [70.24175620901538]
Code agents have gained widespread adoption due to their strong code generation capabilities and integration with code interpreters.<n>Current static safety benchmarks and red-teaming tools are inadequate for identifying emerging real-world risky scenarios.<n>We propose RedCodeAgent, the first automated red-teaming agent designed to systematically uncover vulnerabilities in diverse code agents.
arXiv Detail & Related papers (2025-10-02T22:59:06Z) - Cuckoo Attack: Stealthy and Persistent Attacks Against AI-IDE [64.47951172662745]
Cuckoo Attack is a novel attack that achieves stealthy and persistent command execution by embedding malicious payloads into configuration files.<n>We formalize our attack paradigm into two stages, including initial infection and persistence.<n>We contribute seven actionable checkpoints for vendors to evaluate their product security.
arXiv Detail & Related papers (2025-09-19T04:10:52Z) - GitHub's Copilot Code Review: Can AI Spot Security Flaws Before You Commit? [0.0]
This study evaluates the effectiveness of GitHub Copilot's recently introduced code review feature in detecting security vulnerabilities.<n>Contrary to expectations, our results reveal that Copilot's code review frequently fails to detect critical vulnerabilities.<n>Our results highlight the continued necessity of dedicated security tools and manual code audits to ensure robust software security.
arXiv Detail & Related papers (2025-09-17T02:56:21Z) - OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents [60.78202583483591]
We introduce OS-Harm, a new benchmark for measuring safety of computer use agents.<n> OS-Harm is built on top of the OSWorld environment and aims to test models across three categories of harm: deliberate user misuse, prompt injection attacks, and model misbehavior.<n>We evaluate computer use agents based on a range of frontier models and provide insights into their safety.
arXiv Detail & Related papers (2025-06-17T17:59:31Z) - "I wasn't sure if this is indeed a security risk": Data-driven Understanding of Security Issue Reporting in GitHub Repositories of Open Source npm Packages [8.360992461585308]
This work collected 10,907,467 issues reported across GitHub repositories of 45,466 diverse npm packages.<n>We found that the tags associated with these issues indicate the existence of only 0.13% security-related issues.<n>Our approach of manual analysis followed by developing high accuracy machine learning models identify 1,617,738 security-related issues which are not tagged as security-related.
arXiv Detail & Related papers (2025-06-09T13:11:35Z) - Six Million (Suspected) Fake Stars in GitHub: A Growing Spiral of Popularity Contests, Spams, and Malware [52.84746696418136]
We present a systematic, global, and longitudinal measurement study of fake stars in GitHub.<n>We build StarScout, a scalable tool able to detect anomalous starring behaviors across all GitHub metadata between 2019 and 2024.<n>Analyzing the data collected using StarScout, we find that: (1) fake-star-related activities have rapidly surged in 2024; (2) the accounts and repositories in fake star campaigns have highly trivial activity patterns; and the majority of fake stars are used to promote short-lived phishing malware repositories.
arXiv Detail & Related papers (2024-12-18T03:03:58Z) - On the effectiveness of Large Language Models for GitHub Workflows [9.82254417875841]
Large Language Models (LLMs) have demonstrated their effectiveness in various software development tasks.
We perform the first comprehensive study to understand the effectiveness of LLMs on five workflow-related tasks with different levels of prompts.
Our evaluation of three state-of-art LLMs and their fine-tuned variants revealed various interesting findings on the current effectiveness and drawbacks of LLMs.
arXiv Detail & Related papers (2024-03-19T05:14:12Z) - CodeLMSec Benchmark: Systematically Evaluating and Finding Security
Vulnerabilities in Black-Box Code Language Models [58.27254444280376]
Large language models (LLMs) for automatic code generation have achieved breakthroughs in several programming tasks.
Training data for these models is usually collected from the Internet (e.g., from open-source repositories) and is likely to contain faults and security vulnerabilities.
This unsanitized training data can cause the language models to learn these vulnerabilities and propagate them during the code generation procedure.
arXiv Detail & Related papers (2023-02-08T11:54:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.