An Empirical Study on Workflows and Security Policies in Popular GitHub
Repositories
- URL: http://arxiv.org/abs/2305.16120v1
- Date: Thu, 25 May 2023 14:52:23 GMT
- Title: An Empirical Study on Workflows and Security Policies in Popular GitHub
Repositories
- Authors: Jessy Ayala and Joshua Garcia
- Abstract summary: In open-source projects, anyone can contribute, so it is important to have an active continuous integration and continuous delivery (CI/CD) pipeline.
Many of these projects are hosted on GitHub, where maintainers can create automated security policies.
We measure the usage of GitHub and security policies in thousands of popular repositories based on the number of stars.
- Score: 9.048328480295224
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In open-source projects, anyone can contribute, so it is important to have an
active continuous integration and continuous delivery (CI/CD) pipeline in
addition to a protocol for reporting security concerns, especially in projects
that are widely used and belong to the software supply chain. Many of these
projects are hosted on GitHub, where maintainers can create automated workflows
using GitHub Actions, introduced in 2019, for inspecting proposed changes to
source code and defining a security policy for reporting vulnerabilities. We
conduct an empirical study to measure the usage of GitHub workflows and
security policies in thousands of popular repositories based on the number of
stars. After querying the top one-hundred and top one-thousand repositories
from all 181 trending GitHub topics, and the top 4,900 overall repositories,
totaling just over 173 thousand projects, we find that 37% of projects have
workflows enabled and 7% have a security policy in place. Using the top 60
repositories from each of the 34 most popular programming languages on GitHub,
2,040 projects total, we find that 57% of projects have workflows enabled and
17% have a security policy in place. Furthermore, from those top repositories
that have support for GitHub CodeQL static analysis, which performs bug and
vulnerability checks, only 13.5% have it enabled; in fact, we find that only
1.7% of the top repositories using Kotlin have an active CodeQL scanning
workflow. These results highlight that open-source project maintainers should
prioritize configuring workflows, enabling automated static analysis whenever
possible, and defining a security policy to prevent vulnerabilities from being
introduced or remaining in source code.
Related papers
- EnvBench: A Benchmark for Automated Environment Setup [76.02998475135824]
Large Language Models have enabled researchers to focus on practical repository-level tasks in software engineering domain.
Existing studies on environment setup introduce innovative agentic strategies, but their evaluation is often based on small datasets.
To address this gap, we introduce a comprehensive environment setup benchmark EnvBench.
arXiv Detail & Related papers (2025-03-18T17:19:12Z) - An Empirical Study of Dotfiles Repositories Containing User-Specific Configuration Files [1.7556600627464058]
Hundreds of thousands choose to publicly host their repositories on GitHub.
We collected and analyzed publicly-hosted dotfiles repositories on GitHub.
We found that 25.8% of the top 500 most-starred GitHub users maintain some form of publicly accessible dotfiles repository.
arXiv Detail & Related papers (2025-01-30T18:32:46Z) - How to Understand Whole Software Repository? [64.19431011897515]
An excellent understanding of the whole repository will be the critical path to Automatic Software Engineering (ASE)
We develop a novel method named RepoUnderstander by guiding agents to comprehensively understand the whole repositories.
To better utilize the repository-level knowledge, we guide the agents to summarize, analyze, and plan.
arXiv Detail & Related papers (2024-06-03T15:20:06Z) - Analyzing the Accessibility of GitHub Repositories for PyPI and NPM Libraries [91.97201077607862]
Industrial applications heavily rely on open-source software (OSS) libraries, which provide various benefits.
To monitor the activities of such communities, a comprehensive list of repositories for the libraries of an ecosystem must be accessible.
In this study, we analyze the accessibility of GitHub repositories for PyPI and NPM libraries.
arXiv Detail & Related papers (2024-04-26T13:27:04Z) - RepoAgent: An LLM-Powered Open-Source Framework for Repository-level
Code Documentation Generation [79.83270415843857]
We introduce RepoAgent, a large language model powered open-source framework aimed at proactively generating, maintaining, and updating code documentation.
We have validated the effectiveness of our approach, showing that RepoAgent excels in generating high-quality repository-level documentation.
arXiv Detail & Related papers (2024-02-26T15:39:52Z) - GitAgent: Facilitating Autonomous Agent with GitHub by Tool Extension [81.44231422624055]
A growing area of research focuses on Large Language Models (LLMs) equipped with external tools capable of performing diverse tasks.
In this paper, we introduce GitAgent, an agent capable of achieving the autonomous tool extension from GitHub.
arXiv Detail & Related papers (2023-12-28T15:47:30Z) - Exploring Security Practices in Infrastructure as Code: An Empirical
Study [54.669404064111795]
Cloud computing has become popular thanks to the widespread use of Infrastructure as Code (IaC) tools.
scripting process does not automatically prevent practitioners from introducing misconfigurations, vulnerabilities, or privacy risks.
Ensuring security relies on practitioners understanding and the adoption of explicit policies, guidelines, or best practices.
arXiv Detail & Related papers (2023-08-07T23:43:32Z) - The GitHub Development Workflow Automation Ecosystems [47.818229204130596]
Large-scale software development has become a highly collaborative endeavour.
This chapter explores the ecosystems of development bots and GitHub Actions.
It provides an extensive survey of the state-of-the-art in this domain.
arXiv Detail & Related papers (2023-05-08T15:24:23Z) - Detecting Security Patches via Behavioral Data in Code Repositories [11.052678122289871]
We show a system to automatically identify security patches using only the developer behavior in the Git repository.
We showed we can reveal concealed security patches with an accuracy of 88.3% and F1 Score of 89.8%.
arXiv Detail & Related papers (2023-02-04T06:43:07Z) - Automatically Categorising GitHub Repositories by Application Domain [14.265666415804025]
GitHub is the largest host of open source software on the Internet.
It is becoming increasingly hard to navigate the plethora of repositories which span a wide range of domains.
Past work has shown that taking the application domain into account is crucial for tasks such as predicting the popularity of a repository.
arXiv Detail & Related papers (2022-07-30T16:27:16Z) - GitRank: A Framework to Rank GitHub Repositories [0.0]
Open-source repositories provide wealth of information and are increasingly being used to build artificial intelligence (AI) based systems.
In this hackathon, we utilize known code quality measures and GrimoireLab toolkit to implement a framework, named GitRank, to rank open-source repositories on three different criteria.
arXiv Detail & Related papers (2022-05-04T23:42:30Z) - LabelGit: A Dataset for Software Repositories Classification using
Attributed Dependency Graphs [11.523471275501857]
We create a new dataset of GitHub projects called LabelGit.
Our dataset uses direct information from the source code, like the dependency graph and source code neural representations from the identifiers.
We hope to aid the development of solutions that do not rely on proxies but use the entire source code to perform classification.
arXiv Detail & Related papers (2021-03-16T07:28:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.