The penumbra of open source: projects outside of centralized platforms
are longer maintained, more academic and more collaborative
- URL: http://arxiv.org/abs/2106.15611v3
- Date: Sun, 22 May 2022 17:48:55 GMT
- Title: The penumbra of open source: projects outside of centralized platforms
are longer maintained, more academic and more collaborative
- Authors: Milo Z. Trujillo, Laurent H\'ebert-Dufresne and James Bagrow
- Abstract summary: We develop a novel, extensive sample of public open source project repositories outside of centralized platforms.
Our sample projects tend to have more collaborators, are maintained for longer periods, and tend to be more focused on academic and scientific problems.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: GitHub has become the central online platform for much of open source,
hosting most open source code repositories. With this popularity, the public
digital traces of GitHub are now a valuable means to study teamwork and
collaboration. In many ways, however, GitHub is a convenience sample, and may
not be representative of open source development off the platform. Here we
develop a novel, extensive sample of public open source project repositories
outside of centralized platforms. We characterized these projects along a
number of dimensions, and compare to a time-matched sample of corresponding
GitHub projects. Our sample projects tend to have more collaborators, are
maintained for longer periods, and tend to be more focused on academic and
scientific problems.
Related papers
- Visual Analysis of GitHub Issues to Gain Insights [2.9051263101214566]
This paper presents a prototype web application that generates visualizations to offer insights into issue timelines.
It focuses on the lifecycle of issues and depicts vital information to enhance users' understanding of development patterns.
arXiv Detail & Related papers (2024-07-30T15:17:57Z) - Long Code Arena: a Set of Benchmarks for Long-Context Code Models [75.70507534322336]
Long Code Arena is a suite of six benchmarks for code processing tasks that require project-wide context.
These tasks cover different aspects of code processing: library-based code generation, CI builds repair, project-level code completion, commit message generation, bug localization, and module summarization.
For each task, we provide a manually verified dataset for testing, an evaluation suite, and open-source baseline solutions.
arXiv Detail & Related papers (2024-06-17T14:58:29Z) - How to Understand Whole Software Repository? [64.19431011897515]
An excellent understanding of the whole repository will be the critical path to Automatic Software Engineering (ASE)
We develop a novel method named RepoUnderstander by guiding agents to comprehensively understand the whole repositories.
To better utilize the repository-level knowledge, we guide the agents to summarize, analyze, and plan.
arXiv Detail & Related papers (2024-06-03T15:20:06Z) - Analyzing the Accessibility of GitHub Repositories for PyPI and NPM Libraries [91.97201077607862]
Industrial applications heavily rely on open-source software (OSS) libraries, which provide various benefits.
To monitor the activities of such communities, a comprehensive list of repositories for the libraries of an ecosystem must be accessible.
In this study, we analyze the accessibility of GitHub repositories for PyPI and NPM libraries.
arXiv Detail & Related papers (2024-04-26T13:27:04Z) - Open Source Prover in the Attic [46.774583641694804]
The well known JGEX program became open source a few years ago, but seemingly, further development of the program can only be done without the original authors.
In our project, we are looking at whether it is possible to continue such a large project as a newcomer without the involvement of the original authors.
arXiv Detail & Related papers (2024-01-22T12:50:29Z) - How do Software Engineering Researchers Use GitHub? An Empirical Study of Artifacts & Impact [0.2209921757303168]
We ask whether and how authors engage in social coding related to their research.
Ten thousand papers in top SE research venues, hand-annotating their GitHub links, and studying 309 paper-related repositories.
We find a wide distribution in popularity and impact, some strongly correlated with publication venue.
arXiv Detail & Related papers (2023-10-02T18:56:33Z) - The GitHub Development Workflow Automation Ecosystems [47.818229204130596]
Large-scale software development has become a highly collaborative endeavour.
This chapter explores the ecosystems of development bots and GitHub Actions.
It provides an extensive survey of the state-of-the-art in this domain.
arXiv Detail & Related papers (2023-05-08T15:24:23Z) - GitHub Actions: The Impact on the Pull Request Process [7.047566396769727]
This study investigates how projects use GitHub Actions, what the developers discuss about them, and how project activity indicators change after their adoption.
Our results indicate that 1,489 out of 5,000 most popular repositories (almost 30% of our sample) adopt GitHub Actions.
Our findings also suggest that the adoption of GitHub Actions leads to more rejections of pull requests (PRs), more communication in accepted PRs and less communication in rejected PRs.
arXiv Detail & Related papers (2022-06-28T16:24:17Z) - GitRank: A Framework to Rank GitHub Repositories [0.0]
Open-source repositories provide wealth of information and are increasingly being used to build artificial intelligence (AI) based systems.
In this hackathon, we utilize known code quality measures and GrimoireLab toolkit to implement a framework, named GitRank, to rank open-source repositories on three different criteria.
arXiv Detail & Related papers (2022-05-04T23:42:30Z) - Repro: An Open-Source Library for Improving the Reproducibility and
Usability of Publicly Available Research Code [74.28810048824519]
Repro is an open-source library which aims at improving the usability of research code.
It provides a lightweight Python API for running software released by researchers within Docker containers.
arXiv Detail & Related papers (2022-04-29T01:54:54Z) - LabelGit: A Dataset for Software Repositories Classification using
Attributed Dependency Graphs [11.523471275501857]
We create a new dataset of GitHub projects called LabelGit.
Our dataset uses direct information from the source code, like the dependency graph and source code neural representations from the identifiers.
We hope to aid the development of solutions that do not rely on proxies but use the entire source code to perform classification.
arXiv Detail & Related papers (2021-03-16T07:28:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.