How do Software Engineering Researchers Use GitHub? An Empirical Study of Artifacts & Impact
- URL: http://arxiv.org/abs/2310.01566v2
- Date: Fri, 5 Apr 2024 08:32:50 GMT
- Title: How do Software Engineering Researchers Use GitHub? An Empirical Study of Artifacts & Impact
- Authors: Kamel Alrashedy, Ahmed Binjahlan,
- Abstract summary: We ask whether and how authors engage in social coding related to their research.
Ten thousand papers in top SE research venues, hand-annotating their GitHub links, and studying 309 paper-related repositories.
We find a wide distribution in popularity and impact, some strongly correlated with publication venue.
- Score: 0.2209921757303168
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Millions of developers share their code on open-source platforms like GitHub, which offer social coding opportunities such as distributed collaboration and popularity-based ranking. Software engineering researchers have joined in as well, hosting their research artifacts (tools, replication package & datasets) in repositories, an action often marked as part of the publications contribution. Yet a decade after the first such paper-with-GitHub-link, little is known about the fate of such repositories in practice. Do research repositories ever gain the interest of the developer community, or other researchers? If so, how often and why (not)? Does effort invested on GitHub pay off with research impact? In short: we ask whether and how authors engage in social coding related to their research. We conduct a broad empirical investigation of repositories from published work, starting with ten thousand papers in top SE research venues, hand-annotating their 3449 GitHub (and Zenodo) links, and studying 309 paper-related repositories in detail. We find a wide distribution in popularity and impact, some strongly correlated with publication venue. These were often heavily informed by the authors investment in terms of timely responsiveness and upkeep, which was often remarkably subpar by GitHubs standards, if not absent altogether. Yet we also offer hope: popular repositories often go hand-in-hand with well-citepd papers and achieve broad impact. Our findings suggest the need to rethink the research incentives and reward structure around research products requiring such sustained contributions.
Related papers
- How to Understand Whole Software Repository? [64.19431011897515]
An excellent understanding of the whole repository will be the critical path to Automatic Software Engineering (ASE)
We develop a novel method named RepoUnderstander by guiding agents to comprehensively understand the whole repositories.
To better utilize the repository-level knowledge, we guide the agents to summarize, analyze, and plan.
arXiv Detail & Related papers (2024-06-03T15:20:06Z) - Analyzing the Accessibility of GitHub Repositories for PyPI and NPM Libraries [91.97201077607862]
Industrial applications heavily rely on open-source software (OSS) libraries, which provide various benefits.
To monitor the activities of such communities, a comprehensive list of repositories for the libraries of an ecosystem must be accessible.
In this study, we analyze the accessibility of GitHub repositories for PyPI and NPM libraries.
arXiv Detail & Related papers (2024-04-26T13:27:04Z) - Research Artifacts in Software Engineering Publications: Status and Trends [13.765908874440278]
We collect 1,487 artifacts from 2,196 papers published in top-tier SE conferences from 2017 to 2022.
Based on our analysis, we reveal a rise in publications providing artifacts.
The usage of Zenodo for sharing artifacts has significantly increased.
arXiv Detail & Related papers (2024-04-10T09:25:18Z) - A Tale of Two Communities: Exploring Academic References on Stack Overflow [1.2914230269240388]
We find that Stack Overflow communities with different domains of interest engage with academic literature at varying frequencies and speeds.
The contradicting patterns suggest that some disciplines may have diverged in their interests and development trajectories from the corresponding practitioner community.
arXiv Detail & Related papers (2024-03-14T20:33:55Z) - Software Engineering for OpenHarmony: A Research Roadmap [50.56072657598223]
Existing research efforts mainly focus on popular mobile platforms, namely Android and iOS.
OpenHarmony, a newly open-sourced mobile platform, has rarely been considered.
We present to the mobile software engineering community a research roadmap for encouraging our fellow researchers to contribute promising approaches to OpenHarmony.
arXiv Detail & Related papers (2023-11-02T15:27:09Z) - Synthcity: facilitating innovative use cases of synthetic data in
different data modalities [86.52703093858631]
Synthcity is an open-source software package for innovative use cases of synthetic data in ML fairness, privacy and augmentation.
Synthcity provides the practitioners with a single access point to cutting edge research and tools in synthetic data.
arXiv Detail & Related papers (2023-01-18T14:49:54Z) - Deep learning for table detection and structure recognition: A survey [49.09628624903334]
The goal of this survey is to provide a profound comprehension of the major developments in the field of Table Detection.
We provide an analysis of both classic and new applications in the field.
The datasets and source code of the existing models are organized to provide the reader with a compass on this vast literature.
arXiv Detail & Related papers (2022-11-15T19:42:27Z) - Automatically Categorising GitHub Repositories by Application Domain [14.265666415804025]
GitHub is the largest host of open source software on the Internet.
It is becoming increasingly hard to navigate the plethora of repositories which span a wide range of domains.
Past work has shown that taking the application domain into account is crucial for tasks such as predicting the popularity of a repository.
arXiv Detail & Related papers (2022-07-30T16:27:16Z) - GitHub Actions: The Impact on the Pull Request Process [7.047566396769727]
This study investigates how projects use GitHub Actions, what the developers discuss about them, and how project activity indicators change after their adoption.
Our results indicate that 1,489 out of 5,000 most popular repositories (almost 30% of our sample) adopt GitHub Actions.
Our findings also suggest that the adoption of GitHub Actions leads to more rejections of pull requests (PRs), more communication in accepted PRs and less communication in rejected PRs.
arXiv Detail & Related papers (2022-06-28T16:24:17Z) - Repro: An Open-Source Library for Improving the Reproducibility and
Usability of Publicly Available Research Code [74.28810048824519]
Repro is an open-source library which aims at improving the usability of research code.
It provides a lightweight Python API for running software released by researchers within Docker containers.
arXiv Detail & Related papers (2022-04-29T01:54:54Z) - The penumbra of open source: projects outside of centralized platforms
are longer maintained, more academic and more collaborative [0.0]
We develop a novel, extensive sample of public open source project repositories outside of centralized platforms.
Our sample projects tend to have more collaborators, are maintained for longer periods, and tend to be more focused on academic and scientific problems.
arXiv Detail & Related papers (2021-06-29T17:54:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.