Related papers: On the Prevalence and Usage of Commit Signing on GitHub: A Longitudinal and Cross-Domain Study

On the Prevalence and Usage of Commit Signing on GitHub: A Longitudinal and Cross-Domain Study

URL: http://arxiv.org/abs/2504.19215v1
Date: Sun, 27 Apr 2025 12:39:50 GMT
Title: On the Prevalence and Usage of Commit Signing on GitHub: A Longitudinal and Cross-Domain Study
Authors: Anupam Sharma, Sreyashi Karmakar, Gayatri Priyadarsini Kancherla, Abhishek Bichhawat,
Abstract summary: We study the presence of verified commits in GitHub repositories over five years.<n>Only 10% of all the commits in these 60 repositories are verified.<n>We propose ways to identify commit ownership based on GitHub's Events API.
Score: 1.834753484317836
License: http://creativecommons.org/licenses/by/4.0/
Abstract: GitHub is one of the most widely used public code development platform. However, the code hosted publicly on the platform is vulnerable to commit spoofing that allows an adversary to introduce malicious code or commits into the repository by spoofing the commit metadata to indicate that the code was added by a legitimate user. The only defense that GitHub employs is the process of commit signing, which indicates whether a commit is from a valid source or not based on the keys registered by the users. In this work, we perform an empirical analysis of how prevalent is the use of commit signing in commonly used GitHub repositories. To this end, we build a framework that allows us to extract the metadata of all prior commits of a GitHub repository, and identify what commits in the repository are verified. We analyzed 60 open-source repositories belonging to four different domains -- web development, databases, machine learning and security -- using our framework and study the presence of verified commits in each repositories over five years. Our analysis shows that only ~10% of all the commits in these 60 repositories are verified. Developers committing code to security-related repositories are much more vigilant when it comes to signing commits by users. We also analyzed different Git clients for the ease of commit signing, and found that GitKraken provides the most convenient way of commit signing whereas GitHub Web provides the most accessible way for verifying commits. During our analysis, we also identified an unexpected behavior in how GitHub handles unverified emails in user accounts preventing legitimate owner to use the email address. We believe that the low number of verified commits may be due to lack of awareness, difficulty in setup and key management. Finally, we propose ways to identify commit ownership based on GitHub's Events API addressing the issue of commit spoofing.

Related papers

MutaGReP: Execution-Free Repository-Grounded Plan Search for Code-Use [92.28400093066212]
MutaGReP is an approach to search for plans that decompose a user request into natural language steps grounded in a large code repository.<n>Our plans use less than 5% of the 128K context window for GPT-4o but rival the coding performance of GPT-4o with a context window filled with the repo.
arXiv Detail & Related papers (2025-02-21T18:58:17Z)
An Empirical Study of Dotfiles Repositories Containing User-Specific Configuration Files [1.7556600627464058]
Hundreds of thousands choose to publicly host their repositories on GitHub.<n>We collected and analyzed publicly-hosted dotfiles repositories on GitHub.<n>We found that 25.8% of the top 500 most-starred GitHub users maintain some form of publicly accessible dotfiles repository.
arXiv Detail & Related papers (2025-01-30T18:32:46Z)
4.5 Million (Suspected) Fake Stars in GitHub: A Growing Spiral of Popularity Contests, Scams, and Malware [58.60545935390151]
We present a global, longitudinal measurement study of fake stars in GitHub.<n>We build StarScout, a scalable tool able to detect anomalous starring behaviors.<n>Our study has implications for platform moderators, open-source practitioners, and supply chain security researchers.
arXiv Detail & Related papers (2024-12-18T03:03:58Z)
Visual Analysis of GitHub Issues to Gain Insights [2.9051263101214566]
This paper presents a prototype web application that generates visualizations to offer insights into issue timelines. It focuses on the lifecycle of issues and depicts vital information to enhance users' understanding of development patterns.
arXiv Detail & Related papers (2024-07-30T15:17:57Z)
Alibaba LingmaAgent: Improving Automated Issue Resolution via Comprehensive Repository Exploration [64.19431011897515]
This paper presents Alibaba LingmaAgent, a novel Automated Software Engineering method designed to comprehensively understand and utilize whole software repositories for issue resolution.<n>Our approach introduces a top-down method to condense critical repository information into a knowledge graph, reducing complexity, and employs a Monte Carlo tree search based strategy.<n>In production deployment and evaluation at Alibaba Cloud, LingmaAgent automatically resolved 16.9% of in-house issues faced by development engineers, and solved 43.3% of problems after manual intervention.
arXiv Detail & Related papers (2024-06-03T15:20:06Z)
MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution [47.850418420195304]
Large Language Models (LLMs) have shown promise in code generation but face difficulties in resolving GitHub issues. We propose a novel Multi-Agent framework for GitHub Issue reSolution, MAGIS, consisting of four agents customized for software evolution.
arXiv Detail & Related papers (2024-03-26T17:57:57Z)
Unveiling A Hidden Risk: Exposing Educational but Malicious Repositories in GitHub [0.0]
We use ChatGPT to understand and annotate the content published in software repositories. We carry out a systematic study on a collection of 35.2K GitHub repositories claimed to be created for educational purposes only.
arXiv Detail & Related papers (2024-03-07T11:36:09Z)
GitAgent: Facilitating Autonomous Agent with GitHub by Tool Extension [81.44231422624055]
A growing area of research focuses on Large Language Models (LLMs) equipped with external tools capable of performing diverse tasks. In this paper, we introduce GitAgent, an agent capable of achieving the autonomous tool extension from GitHub.
arXiv Detail & Related papers (2023-12-28T15:47:30Z)
Wait, wasn't that code here before? Detecting Outdated Software Documentation [9.45052138795667]
We present a GitHub Actions tool that automatically scans for outdated code element references. More than a quarter of the 1000 most popular projects on GitHub contained at least one outdated reference.
arXiv Detail & Related papers (2023-07-10T00:52:29Z)
On the Security Blind Spots of Software Composition Analysis [46.1389163921338]
We present a novel approach to detect vulnerable clones in the Maven repository. We retrieve over 53k potential vulnerable clones from Maven Central. We detect 727 confirmed vulnerable clones and synthesize a testable proof-of-vulnerability project for each of those.
arXiv Detail & Related papers (2023-06-08T20:14:46Z)
GitHub Actions: The Impact on the Pull Request Process [7.047566396769727]
This study investigates how projects use GitHub Actions, what the developers discuss about them, and how project activity indicators change after their adoption. Our results indicate that 1,489 out of 5,000 most popular repositories (almost 30% of our sample) adopt GitHub Actions. Our findings also suggest that the adoption of GitHub Actions leads to more rejections of pull requests (PRs), more communication in accepted PRs and less communication in rejected PRs.
arXiv Detail & Related papers (2022-06-28T16:24:17Z)
A ground-truth dataset and classification model for detecting bots in GitHub issue and PR comments [70.1864008701113]
Bots are used in Github repositories to automate repetitive activities that are part of the distributed software development process. This paper proposes a ground-truth dataset, based on a manual analysis with high interrater agreement, of pull request and issue comments in 5,000 distinct Github accounts. We propose an automated classification model to detect bots, taking as main features the number of empty and non-empty comments of each account, the number of comment patterns, and the inequality between comments within comment patterns.
arXiv Detail & Related papers (2020-10-07T09:30:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.