An Empirical Study on Workflows and Security Policies in Popular GitHub
Repositories
- URL: http://arxiv.org/abs/2305.16120v1
- Date: Thu, 25 May 2023 14:52:23 GMT
- Title: An Empirical Study on Workflows and Security Policies in Popular GitHub
Repositories
- Authors: Jessy Ayala and Joshua Garcia
- Abstract summary: In open-source projects, anyone can contribute, so it is important to have an active continuous integration and continuous delivery (CI/CD) pipeline.
Many of these projects are hosted on GitHub, where maintainers can create automated security policies.
We measure the usage of GitHub and security policies in thousands of popular repositories based on the number of stars.
- Score: 9.048328480295224
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In open-source projects, anyone can contribute, so it is important to have an
active continuous integration and continuous delivery (CI/CD) pipeline in
addition to a protocol for reporting security concerns, especially in projects
that are widely used and belong to the software supply chain. Many of these
projects are hosted on GitHub, where maintainers can create automated workflows
using GitHub Actions, introduced in 2019, for inspecting proposed changes to
source code and defining a security policy for reporting vulnerabilities. We
conduct an empirical study to measure the usage of GitHub workflows and
security policies in thousands of popular repositories based on the number of
stars. After querying the top one-hundred and top one-thousand repositories
from all 181 trending GitHub topics, and the top 4,900 overall repositories,
totaling just over 173 thousand projects, we find that 37% of projects have
workflows enabled and 7% have a security policy in place. Using the top 60
repositories from each of the 34 most popular programming languages on GitHub,
2,040 projects total, we find that 57% of projects have workflows enabled and
17% have a security policy in place. Furthermore, from those top repositories
that have support for GitHub CodeQL static analysis, which performs bug and
vulnerability checks, only 13.5% have it enabled; in fact, we find that only
1.7% of the top repositories using Kotlin have an active CodeQL scanning
workflow. These results highlight that open-source project maintainers should
prioritize configuring workflows, enabling automated static analysis whenever
possible, and defining a security policy to prevent vulnerabilities from being
introduced or remaining in source code.
Related papers
- An Empirical Study of the Evolution of GitHub Actions Workflows [41.25198736954168]
CI/CD practices play a significant role during collaborative software development.<n> GitHub Actions, the CI/CD tool integrated into GitHub, allows repository maintainers to automate development.<n>We conducted a mixed methods analysis of GitHub Actions workflow changes over time.
arXiv Detail & Related papers (2026-02-16T09:05:42Z) - CVE-Factory: Scaling Expert-Level Agentic Tasks for Code Security Vulnerability [50.57373283154859]
We present CVE-Factory, the first multiagent framework to achieve expert-level quality in automatically transforming vulnerability tasks.<n>It is also evaluated on the latest realistic vulnerabilities and achieves a 66.2% verified success.<n>We synthesize over 1,000 executable training environments, the first large-scale scaling of agentic tasks in code security.
arXiv Detail & Related papers (2026-02-03T02:27:16Z) - Unpacking Security Scanners for GitHub Actions Workflows [2.046588369793562]
GitHub Actions is a widely used platform that allows developers to automate the build and deployment of their projects.<n>As the platform's popularity continues to grow, it has become a target of choice for recent software supply chain attacks.<n>Several security scanners have emerged to help developers harden their GitHub Actions.
arXiv Detail & Related papers (2026-01-20T20:25:11Z) - Analyzing the Availability of E-Mail Addresses for PyPI Libraries [89.21869606965578]
81.6% of libraries include at least one valid e-mail address, with PyPI serving as the primary source.<n>We identify over 698,000 invalid entries, primarily due to missing fields.
arXiv Detail & Related papers (2026-01-20T14:54:58Z) - RepoGenesis: Benchmarking End-to-End Microservice Generation from Readme to Repository [52.98970048197381]
RepoGenesis is the first multilingual benchmark for repository-level end-to-end web microservice generation.<n>It consists of 106 repositories (60 Python, 46 Java) across 18 domains and 11 frameworks, with 1,258 API endpoints and 2,335 test cases verified.<n>Results reveal that despite high AC (up to 73.91%) and DSR (up to 100%), the best-performing system achieves only 23.67% Pass@1 on Python and 21.45% on Java.
arXiv Detail & Related papers (2026-01-20T13:19:20Z) - Granite: Granular Runtime Enforcement for GitHub Actions Permissions [2.278720757613755]
We present Granite, a proxy-based system that enforces fine-starred permissions for GitHub Actions at the step-level granularity within a job.<n>Our analysis reveals that 52.7% of the jobs can be protected by Granite against permission misuse attacks.
arXiv Detail & Related papers (2025-12-12T14:38:45Z) - Trace: Securing Smart Contract Repository Against Access Control Vulnerability [58.02691083789239]
GitHub hosts numerous smart contract repositories containing source code, documentation, and configuration files.<n>Third-party developers often reference, reuse, or fork code from these repositories during custom development.<n>Existing tools for detecting smart contract vulnerabilities are limited in their ability to handle complex repositories.
arXiv Detail & Related papers (2025-10-22T05:18:28Z) - SwingArena: Competitive Programming Arena for Long-context GitHub Issue Solving [90.32201622392137]
We present SwingArena, a competitive evaluation framework for Large Language Models (LLMs)<n>Unlike traditional static benchmarks, SwingArena models the collaborative process of software by pairing LLMs as iterations, who generate patches, and reviewers, who create test cases and verify the patches through continuous integration (CI) pipelines.
arXiv Detail & Related papers (2025-05-29T18:28:02Z) - Eradicating the Unseen: Detecting, Exploiting, and Remediating a Path Traversal Vulnerability across GitHub [1.2124551005857036]
Vulnerabilities in open-source software can cause cascading effects in the modern digital ecosystem.<n>We identified 1,756 vulnerable open-source projects, some of which are very influential.<n>We have responsibly disclosed the vulnerability to the maintainers, and 14% of the reported vulnerabilities have been remediated.
arXiv Detail & Related papers (2025-05-26T16:29:21Z) - EnvBench: A Benchmark for Automated Environment Setup [76.02998475135824]
Large Language Models have enabled researchers to focus on practical repository-level tasks in software engineering domain.
Existing studies on environment setup introduce innovative agentic strategies, but their evaluation is often based on small datasets.
To address this gap, we introduce a comprehensive environment setup benchmark EnvBench.
arXiv Detail & Related papers (2025-03-18T17:19:12Z) - An Empirical Study of Dotfiles Repositories Containing User-Specific Configuration Files [1.7556600627464058]
Hundreds of thousands choose to publicly host their repositories on GitHub.
We collected and analyzed publicly-hosted dotfiles repositories on GitHub.
We found that 25.8% of the top 500 most-starred GitHub users maintain some form of publicly accessible dotfiles repository.
arXiv Detail & Related papers (2025-01-30T18:32:46Z) - How to Understand Whole Software Repository? [64.19431011897515]
An excellent understanding of the whole repository will be the critical path to Automatic Software Engineering (ASE)
We develop a novel method named RepoUnderstander by guiding agents to comprehensively understand the whole repositories.
To better utilize the repository-level knowledge, we guide the agents to summarize, analyze, and plan.
arXiv Detail & Related papers (2024-06-03T15:20:06Z) - Analyzing the Accessibility of GitHub Repositories for PyPI and NPM Libraries [91.97201077607862]
Industrial applications heavily rely on open-source software (OSS) libraries, which provide various benefits.
To monitor the activities of such communities, a comprehensive list of repositories for the libraries of an ecosystem must be accessible.
In this study, we analyze the accessibility of GitHub repositories for PyPI and NPM libraries.
arXiv Detail & Related papers (2024-04-26T13:27:04Z) - RepoAgent: An LLM-Powered Open-Source Framework for Repository-level
Code Documentation Generation [79.83270415843857]
We introduce RepoAgent, a large language model powered open-source framework aimed at proactively generating, maintaining, and updating code documentation.
We have validated the effectiveness of our approach, showing that RepoAgent excels in generating high-quality repository-level documentation.
arXiv Detail & Related papers (2024-02-26T15:39:52Z) - GitAgent: Facilitating Autonomous Agent with GitHub by Tool Extension [81.44231422624055]
A growing area of research focuses on Large Language Models (LLMs) equipped with external tools capable of performing diverse tasks.
In this paper, we introduce GitAgent, an agent capable of achieving the autonomous tool extension from GitHub.
arXiv Detail & Related papers (2023-12-28T15:47:30Z) - Exploring Security Practices in Infrastructure as Code: An Empirical
Study [54.669404064111795]
Cloud computing has become popular thanks to the widespread use of Infrastructure as Code (IaC) tools.
scripting process does not automatically prevent practitioners from introducing misconfigurations, vulnerabilities, or privacy risks.
Ensuring security relies on practitioners understanding and the adoption of explicit policies, guidelines, or best practices.
arXiv Detail & Related papers (2023-08-07T23:43:32Z) - The GitHub Development Workflow Automation Ecosystems [47.818229204130596]
Large-scale software development has become a highly collaborative endeavour.
This chapter explores the ecosystems of development bots and GitHub Actions.
It provides an extensive survey of the state-of-the-art in this domain.
arXiv Detail & Related papers (2023-05-08T15:24:23Z) - Detecting Security Patches via Behavioral Data in Code Repositories [11.052678122289871]
We show a system to automatically identify security patches using only the developer behavior in the Git repository.
We showed we can reveal concealed security patches with an accuracy of 88.3% and F1 Score of 89.8%.
arXiv Detail & Related papers (2023-02-04T06:43:07Z) - Automatically Categorising GitHub Repositories by Application Domain [14.265666415804025]
GitHub is the largest host of open source software on the Internet.
It is becoming increasingly hard to navigate the plethora of repositories which span a wide range of domains.
Past work has shown that taking the application domain into account is crucial for tasks such as predicting the popularity of a repository.
arXiv Detail & Related papers (2022-07-30T16:27:16Z) - GitRank: A Framework to Rank GitHub Repositories [0.0]
Open-source repositories provide wealth of information and are increasingly being used to build artificial intelligence (AI) based systems.
In this hackathon, we utilize known code quality measures and GrimoireLab toolkit to implement a framework, named GitRank, to rank open-source repositories on three different criteria.
arXiv Detail & Related papers (2022-05-04T23:42:30Z) - LabelGit: A Dataset for Software Repositories Classification using
Attributed Dependency Graphs [11.523471275501857]
We create a new dataset of GitHub projects called LabelGit.
Our dataset uses direct information from the source code, like the dependency graph and source code neural representations from the identifiers.
We hope to aid the development of solutions that do not rely on proxies but use the entire source code to perform classification.
arXiv Detail & Related papers (2021-03-16T07:28:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.