Organizational Artifacts of Code Development
- URL: http://arxiv.org/abs/2105.14637v1
- Date: Sun, 30 May 2021 22:04:09 GMT
- Title: Organizational Artifacts of Code Development
- Authors: Parisa Kaghazgaran, Nichola Lubold, Fred Morstatter
- Abstract summary: We study social effects of country by measuring differences in software repositories associated with different countries.
We propose a novel approach of modeling repositories based on their sequence of development activities as a sequence embedding task.
We conduct a case study on repos from well-known corporations and find that country can describe the differences in development better than the company affiliation itself.
- Score: 10.863006516392831
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Software is the outcome of active and effective communication between members
of an organization. This has been noted with Conway's law, which states that
``organizations design systems that mirror their own communication structure.''
However, software developers are often members of multiple organizational
groups (e.g., corporate, regional,) and it is unclear how association with
groups beyond one's company influence the development process. In this paper,
we study social effects of country by measuring differences in software
repositories associated with different countries. Using a novel dataset we
obtain from GitHub, we identify key properties that differentiate software
repositories based upon the country of the developers. We propose a novel
approach of modeling repositories based on their sequence of development
activities as a sequence embedding task and coupled with repo profile features
we achieve 79.2% accuracy in identifying the country of a repository. Finally,
we conduct a case study on repos from well-known corporations and find that
country can describe the differences in development better than the company
affiliation itself. These results have larger implications for software
development and indicate the importance of considering the multiple groups
developers are associated with when considering the formation and structure of
teams.
Related papers
- Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion? [60.84912551069379]
We present the Code-Development Benchmark (Codev-Bench), a fine-grained, real-world, repository-level, and developer-centric evaluation framework.
Codev-Agent is an agent-based system that automates repository crawling, constructs execution environments, extracts dynamic calling chains from existing unit tests, and generates new test samples to avoid data leakage.
arXiv Detail & Related papers (2024-10-02T09:11:10Z) - Code Ownership: The Principles, Differences, and Their Associations with Software Quality [6.123324869194196]
We investigate the differences in the commonly used ownership approximations in terms of the set of developers, the approximated code ownership values, and the expertise level.
We find that commit-based and line-based ownership approximations produce different sets of developers, different code ownership values, and different sets of major developers.
arXiv Detail & Related papers (2024-08-23T03:01:59Z) - Multi-Agent Software Development through Cross-Team Collaboration [30.88149502999973]
We introduce Cross-Team Collaboration (CTC), a scalable multi-team framework for software development.
CTC enables orchestrated teams to jointly propose various decisions and communicate with their insights.
Results show a notable increase in quality compared to state-of-the-art baselines.
arXiv Detail & Related papers (2024-06-13T10:18:36Z) - How to Understand Whole Software Repository? [64.19431011897515]
An excellent understanding of the whole repository will be the critical path to Automatic Software Engineering (ASE)
We develop a novel method named RepoUnderstander by guiding agents to comprehensively understand the whole repositories.
To better utilize the repository-level knowledge, we guide the agents to summarize, analyze, and plan.
arXiv Detail & Related papers (2024-06-03T15:20:06Z) - Governing the Commons: Code Ownership and Code-Clones in Large-Scale Software Development [6.249768559720122]
In software development organizations employing weak or collective ownership, different teams are allowed and expected to autonomously perform changes in various components.
Our objective is to understand how and why different teams introduce technical debt in the form of code clones as they change different components.
arXiv Detail & Related papers (2024-05-24T18:23:51Z) - MAgIC: Investigation of Large Language Model Powered Multi-Agent in
Cognition, Adaptability, Rationality and Collaboration [102.41118020705876]
Large Language Models (LLMs) have marked a significant advancement in the field of natural language processing.
As their applications extend into multi-agent environments, a need has arisen for a comprehensive evaluation framework.
This work introduces a novel benchmarking framework specifically tailored to assess LLMs within multi-agent settings.
arXiv Detail & Related papers (2023-11-14T21:46:27Z) - The GitHub Development Workflow Automation Ecosystems [47.818229204130596]
Large-scale software development has become a highly collaborative endeavour.
This chapter explores the ecosystems of development bots and GitHub Actions.
It provides an extensive survey of the state-of-the-art in this domain.
arXiv Detail & Related papers (2023-05-08T15:24:23Z) - Detecting and Optimising Team Interactions in Software Development [58.720142291102135]
This paper presents a data-driven approach to detect the functional interaction structure for software development teams.
Our approach considers differences in the activity levels of team members and uses a block-constrained configuration model.
We show how our approach enables teams to compare their functional interaction structure against synthetically created benchmark scenarios.
arXiv Detail & Related papers (2023-02-28T14:53:29Z) - Big Data = Big Insights? Operationalising Brooks' Law in a Massive
GitHub Data Set [1.1470070927586014]
We study challenges that can explain the disagreement between recent studies of developer productivity in massive repository data.
We provide, to the best of our knowledge, the largest, curated corpus of GitHub projects tailored to investigate the influence of team size and collaboration patterns on individual and collective productivity.
arXiv Detail & Related papers (2022-01-12T17:25:30Z) - S3M: Siamese Stack (Trace) Similarity Measure [55.58269472099399]
We present S3M -- the first approach to computing stack trace similarity based on deep learning.
It is based on a biLSTM encoder and a fully-connected classifier to compute similarity.
Our experiments demonstrate the superiority of our approach over the state-of-the-art on both open-sourced data and a private JetBrains dataset.
arXiv Detail & Related papers (2021-03-18T21:10:41Z) - ConE: A Concurrent Edit Detection Tool for Large ScaleSoftware
Development [16.11297015618479]
ConE proactively detects concurrent edits to help mitigate the problems caused by them.
We present the results of ConE's deployment through early intervention techniques such as pull request notifications.
arXiv Detail & Related papers (2021-01-16T22:55:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.