The Geography of Open Source Software: Evidence from GitHub
- URL: http://arxiv.org/abs/2107.03200v2
- Date: Tue, 12 Oct 2021 08:25:28 GMT
- Title: The Geography of Open Source Software: Evidence from GitHub
- Authors: Johannes Wachs, Mariusz Nitecki, William Schueller, Axel Polleres
- Abstract summary: Open Source Software (OSS) plays an important role in the digital economy.
But software development seems to cluster geographically in places such as Silicon Valley, London, or Berlin.
This presents a significant blindspot for policymakers, who tend to promote OSS at the national level.
- Score: 1.5102260054654921
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Open Source Software (OSS) plays an important role in the digital economy.
Yet although software production is amenable to remote collaboration and its
outputs are easily shared across distances, software development seems to
cluster geographically in places such as Silicon Valley, London, or Berlin. And
while recent work indicates that OSS activity creates positive externalities
which accrue locally through knowledge spillovers and information effects,
up-to-date data on the geographic distribution of active open source developers
is limited. This presents a significant blindspot for policymakers, who tend to
promote OSS at the national level as a cost-saving tool for public sector
institutions. We address this gap by geolocating more than half a million
active contributors to GitHub in early 2021 at various spatial scales. Compared
to results from 2010, we find a significant increase in the share of developers
based in Asia, Latin America and Eastern Europe, suggesting a more even spread
of OSS developers globally. Within countries, however, we find significant
concentration in regions, exceeding the concentration of workers in high-tech
fields. Social and economic development indicators predict at most half of
regional variation in OSS activity in the EU, suggesting that clusters of OSS
have idiosyncratic roots. We argue that policymakers seeking to foster OSS
should focus locally rather than nationally, using the tools of cluster policy
to support networks of OSS developers.
Related papers
- Characterising Open Source Co-opetition in Company-hosted Open Source Software Projects: The Cases of PyTorch, TensorFlow, and Transformers [5.2337753974570616]
Companies, including market rivals, have long collaborated on the development of open source software (OSS)
"Open source co-opetition" results in a tangle of co-operation and competition known as "open source co-opetition"
arXiv Detail & Related papers (2024-10-23T19:35:41Z) - The Impact of Generative AI on Collaborative Open-Source Software Development: Evidence from GitHub Copilot [4.8256226973915455]
We investigate the role of GitHub Copilot, a generative AI programmer pair, on software development in open-source community.
We find that Copilot significantly enhances project-level productivity by 6.5%.
We conclude that AI pair programmers bring benefits to developers to automate and augment their code, but human developers' knowledge of software projects can enhance the benefits.
arXiv Detail & Related papers (2024-10-02T23:26:10Z) - The AI Community Building the Future? A Quantitative Analysis of Development Activity on Hugging Face Hub [2.595302141947391]
We analyse development activity on the Hugging Face (HF) Hub, a popular platform for building, sharing, and demonstrating models.
Activity is imbalanced between repositories; for example, over 70% of models have 0 downloads, while 1% account for 99% of downloads.
We find that the community has a core-periphery structure, with a core of prolific developers and a majority of isolate developers.
arXiv Detail & Related papers (2024-05-20T11:10:49Z) - The Software Genome Project: Venture to the Genomic Pathways of Open
Source Software and Its Applications [8.55939767653389]
textbfSoftware Genome Project is geared towards the secure monitoring and exploitation of open-source software.
Software Genome Project builds a complete set of software genome maps to help developers and managers gain a deeper understanding of software complexity and diversity.
arXiv Detail & Related papers (2023-11-16T13:18:24Z) - GeoLLM: Extracting Geospatial Knowledge from Large Language Models [49.20315582673223]
We present GeoLLM, a novel method that can effectively extract geospatial knowledge from large language models.
We demonstrate the utility of our approach across multiple tasks of central interest to the international community, including the measurement of population density and economic livelihoods.
Our experiments reveal that LLMs are remarkably sample-efficient, rich in geospatial information, and robust across the globe.
arXiv Detail & Related papers (2023-10-10T00:03:23Z) - Geo-Encoder: A Chunk-Argument Bi-Encoder Framework for Chinese
Geographic Re-Ranking [61.60169764507917]
Chinese geographic re-ranking task aims to find the most relevant addresses among retrieved candidates.
We propose an innovative framework, namely Geo-Encoder, to more effectively integrate Chinese geographical semantics into re-ranking pipelines.
arXiv Detail & Related papers (2023-09-04T13:44:50Z) - SoTaNa: The Open-Source Software Development Assistant [81.86136560157266]
SoTaNa is an open-source software development assistant.
It generates high-quality instruction-based data for the domain of software engineering.
It employs a parameter-efficient fine-tuning approach to enhance the open-source foundation model, LLaMA.
arXiv Detail & Related papers (2023-08-25T14:56:21Z) - The GitHub Development Workflow Automation Ecosystems [47.818229204130596]
Large-scale software development has become a highly collaborative endeavour.
This chapter explores the ecosystems of development bots and GitHub Actions.
It provides an extensive survey of the state-of-the-art in this domain.
arXiv Detail & Related papers (2023-05-08T15:24:23Z) - GeoNet: Benchmarking Unsupervised Adaptation across Geographies [71.23141626803287]
We study the problem of geographic robustness and make three main contributions.
First, we introduce a large-scale dataset GeoNet for geographic adaptation.
Second, we hypothesize that the major source of domain shifts arise from significant variations in scene context.
Third, we conduct an extensive evaluation of several state-of-the-art unsupervised domain adaptation algorithms and architectures.
arXiv Detail & Related papers (2023-03-27T17:59:34Z) - Outsourcing Training without Uploading Data via Efficient Collaborative
Open-Source Sampling [49.87637449243698]
Traditional outsourcing requires uploading device data to the cloud server.
We propose to leverage widely available open-source data, which is a massive dataset collected from public and heterogeneous sources.
We develop a novel strategy called Efficient Collaborative Open-source Sampling (ECOS) to construct a proximal proxy dataset from open-source data for cloud training.
arXiv Detail & Related papers (2022-10-23T00:12:18Z) - Including Everyone, Everywhere: Understanding Opportunities and
Challenges of Geographic Gender-Inclusion in OSS [15.757897147034873]
This study presents a multi-region geographical analysis of gender inclusion on GitHub.
Gender diversity is low across all parts of the world, with no substantial difference across regions.
There has been statistically significant improvement in diversity worldwide since 2014, with certain regions such as Africa improving at faster pace.
arXiv Detail & Related papers (2020-10-02T07:40:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.