Related papers: Which contributions count? Analysis of attribution in open source

Which contributions count? Analysis of attribution in open source

URL: http://arxiv.org/abs/2103.11007v1
Date: Fri, 19 Mar 2021 20:14:40 GMT
Title: Which contributions count? Analysis of attribution in open source
Authors: Jean-Gabriel Young, Amanda Casari, Katie McLaughlin, Milo Z. Trujillo, Laurent H\'ebert-Dufresne, James P. Bagrow
Abstract summary: We characterize contributor acknowledgment models in open source by analyzing thousands of projects. We find that community-generated systems of contribution acknowledgment make work like idea generation or bug finding more visible.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Open source software projects usually acknowledge contributions with text files, websites, and other idiosyncratic methods. These data sources are hard to mine, which is why contributorship is most frequently measured through changes to repositories, such as commits, pushes, or patches. Recently, some open source projects have taken to recording contributor actions with standardized systems; this opens up a unique opportunity to understand how community-generated notions of contributorship map onto codebases as the measure of contribution. Here, we characterize contributor acknowledgment models in open source by analyzing thousands of projects that use a model called All Contributors to acknowledge diverse contributions like outreach, finance, infrastructure, and community management. We analyze the life cycle of projects through this model's lens and contrast its representation of contributorship with the picture given by other methods of acknowledgment, including GitHub's top committers indicator and contributions derived from actions taken on the platform. We find that community-generated systems of contribution acknowledgment make work like idea generation or bug finding more visible, which generates a more extensive picture of collaboration. Further, we find that models requiring explicit attribution lead to more clearly defined boundaries around what is and what is not a contribution.

Related papers

Why Authors and Maintainers Link (or Don't Link) Their PyPI Libraries to Code Repositories and Donation Platforms [83.16077040470975]
Metadata of libraries on the Python Package Index (PyPI) plays a critical role in supporting the transparency, trust, and sustainability of open-source libraries.<n>This paper presents a large-scale empirical study combining two targeted surveys sent to 50,000 PyPI authors and maintainers.<n>We analyze more than 1,400 responses using large language model (LLM)-based topic modeling to uncover key motivations and barriers related to linking repositories and donation platforms.
arXiv Detail & Related papers (2026-01-21T16:13:57Z)
Seeing Further on the Shoulders of Giants: Knowledge Inheritance for Vision Foundation Models [54.517276878748305]
Vision foundation models (VFMs) are predominantly developed using data-centric methods.<n>Many open-source vision models have been pretrained on domain-specific data.<n>We present a new model-driven approach for training VFMs through joint knowledge transfer and preservation.
arXiv Detail & Related papers (2025-08-20T13:30:23Z)
VISA: Retrieval Augmented Generation with Visual Source Attribution [100.78278689901593]
Existing approaches in RAG primarily link generated content to document-level references. We propose Retrieval-Augmented Generation with Visual Source Attribution (VISA), a novel approach that combines answer generation with visual source attribution. To evaluate its effectiveness, we curated two datasets: Wiki-VISA, based on crawled Wikipedia webpage screenshots, and Paper-VISA, derived from PubLayNet and tailored to the medical domain.
arXiv Detail & Related papers (2024-12-19T02:17:35Z)
Knowledge Islands: Visualizing Developers Knowledge Concentration [0.0]
Knowledge Islands is a tool that visualizes the concentration of knowledge in a software repository using a state-of-the-art knowledge model. It enables practitioners to analyze GitHub projects, determine where knowledge is concentrated, and implement measures to maintain project health.
arXiv Detail & Related papers (2024-08-16T13:32:49Z)
How to Understand Whole Software Repository? [64.19431011897515]
An excellent understanding of the whole repository will be the critical path to Automatic Software Engineering (ASE) We develop a novel method named RepoUnderstander by guiding agents to comprehensively understand the whole repositories. To better utilize the repository-level knowledge, we guide the agents to summarize, analyze, and plan.
arXiv Detail & Related papers (2024-06-03T15:20:06Z)
Unveiling Diversity: Empowering OSS Project Leaders with Community Diversity and Turnover Dashboards [51.67585198094836]
CommunityTapestry is a dynamic real-time community dashboard. It presents key diversity and turnover signals that we identified from the literature. It helped project leaders identify areas of improvement and gave them actionable information.
arXiv Detail & Related papers (2023-12-13T22:12:57Z)
Towards a Structural Equation Model of Open Source Blockchain Software Health [0.0]
This work uses exploratory factor analysis to identify latent constructs that are representative of general public interest or popularity in software. We find that interest is a combination of stars, forks, and text mentions in the GitHub repository, while a second factor for robustness is composed of a criticality score. A structural model of software health is proposed such that general interest positively influences developer engagement, which, in turn, positively predicts software robustness.
arXiv Detail & Related papers (2023-10-31T08:47:41Z)
Collaborative, Code-Proximal Dynamic Software Visualization within Code Editors [55.57032418885258]
This paper introduces the design and proof-of-concept implementation for a software visualization approach that can be embedded into code editors. Our contribution differs from related work in that we use dynamic analysis of a software system's runtime behavior. Our visualization approach enhances common remote pair programming tools and is collaboratively usable by employing shared code cities.
arXiv Detail & Related papers (2023-08-30T06:35:40Z)
Code Recommendation for Open Source Software Developers [32.181023933552694]
CODER is a novel graph-based code recommendation framework for open source software developers. Our framework achieves superior performance under various experimental settings, including intra-project, cross-project, and cold-start recommendation.
arXiv Detail & Related papers (2022-10-15T16:40:36Z)
Attracting and Retaining OSS Contributors with a Maintainer Dashboard [19.885747206499712]
We design a maintainer dashboard that provides recommendations on how to attract and retain open source contributors. We conduct a project-specific evaluation with maintainers to better understand use cases in which this tool will be most helpful. We distill our findings to share what the future of recommendations in open source looks like and how to make these recommendations most meaningful over time.
arXiv Detail & Related papers (2022-02-15T21:39:37Z)
LAGOON: An Analysis Tool for Open Source Communities [7.3861897382622015]
LAGOON is an open source platform for understanding the ecosystems of Open Source Software (OSS) communities. LAGOON ingests artifacts from several common sources, including source code repositories, issue trackers, mailing lists and scraping content from websites. A user interface is provided for visualization and exploration of an OSS project's complete sociotechnical graph.
arXiv Detail & Related papers (2022-01-26T18:52:11Z)
Dimensions of Commonsense Knowledge [60.49243784752026]
We survey a wide range of popular commonsense sources with a special focus on their relations. We consolidate these relations into 13 knowledge dimensions, each abstracting over more specific relations found in sources.
arXiv Detail & Related papers (2021-01-12T17:52:39Z)
KILT: a Benchmark for Knowledge Intensive Language Tasks [102.33046195554886]
We present a benchmark for knowledge-intensive language tasks (KILT) All tasks in KILT are grounded in the same snapshot of Wikipedia. We find that a shared dense vector index coupled with a seq2seq model is a strong baseline.
arXiv Detail & Related papers (2020-09-04T15:32:19Z)
A Transformer-based Approach for Source Code Summarization [86.08359401867577]
We learn code representation for summarization by modeling the pairwise relationship between code tokens. We show that despite the approach is simple, it outperforms the state-of-the-art techniques by a significant margin.
arXiv Detail & Related papers (2020-05-01T23:29:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.