A Cartography of Open Collaboration in Open Source AI: Mapping Practices, Motivations, and Governance in 14 Open Large Language Model Projects
- URL: http://arxiv.org/abs/2509.25397v1
- Date: Mon, 29 Sep 2025 18:55:18 GMT
- Title: A Cartography of Open Collaboration in Open Source AI: Mapping Practices, Motivations, and Governance in 14 Open Large Language Model Projects
- Authors: Johan Linåker, Cailean Osborne, Jennifer Ding, Ben Burtenshaw,
- Abstract summary: The proliferation of open large language models (LLMs) is fostering a vibrant ecosystem of research and innovation in artificial intelligence (AI)<n>The methods of collaboration used to develop open LLMs both before and after their public release have not yet been comprehensively studied.<n>We draw on semi-structured interviews with the developers of 14 open LLMs from grassroots projects, research institutes, startups, and Big Tech companies in North America, Europe, Africa, and Asia.
- Score: 0.8065951670484136
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The proliferation of open large language models (LLMs) is fostering a vibrant ecosystem of research and innovation in artificial intelligence (AI). However, the methods of collaboration used to develop open LLMs both before and after their public release have not yet been comprehensively studied, limiting our understanding of how open LLM projects are initiated, organized, and governed as well as what opportunities there are to foster this ecosystem even further. We address this gap through an exploratory analysis of open collaboration throughout the development and reuse lifecycle of open LLMs, drawing on semi-structured interviews with the developers of 14 open LLMs from grassroots projects, research institutes, startups, and Big Tech companies in North America, Europe, Africa, and Asia. We make three key contributions to research and practice. First, collaboration in open LLM projects extends far beyond the LLMs themselves, encompassing datasets, benchmarks, open source frameworks, leaderboards, knowledge sharing and discussion forums, and compute partnerships, among others. Second, open LLM developers have a variety of social, economic, and technological motivations, from democratizing AI access and promoting open science to building regional ecosystems and expanding language representation. Third, the sampled open LLM projects exhibit five distinct organizational models, ranging from single company projects to non-profit-sponsored grassroots projects, which vary in their centralization of control and community engagement strategies used throughout the open LLM lifecycle. We conclude with practical recommendations for stakeholders seeking to support the global community building a more open future for AI.
Related papers
- Open-Source Multimodal Moxin Models with Moxin-VLM and Moxin-VLA [53.68989489261506]
Moxin 7B is introduced as a fully open-source Large Language Models (LLMs)<n>We develop three variants based on Moxin, including Moxin-VLM, Moxin-VLA, and Moxin-Chinese.<n> Experiments show that our models achieve superior performance in various evaluations.
arXiv Detail & Related papers (2025-12-22T02:36:42Z) - Open-Source LLMs Collaboration Beats Closed-Source LLMs: A Scalable Multi-Agent System [51.04535721779685]
This paper aims to demonstrate the potential and strengths of open-source collectives.<n>We propose SMACS, a scalable multi-agent collaboration system (MACS) framework with high performance.<n> Experiments on eight mainstream benchmarks validate the effectiveness of our SMACS.
arXiv Detail & Related papers (2025-07-14T16:17:11Z) - LLMs' Reshaping of People, Processes, Products, and Society in Software Development: A Comprehensive Exploration with Early Adopters [3.4069804433026314]
Large language models (LLMs) like OpenAI ChatGPT, Google Gemini, and GitHub Copilot are rapidly gaining traction in the software industry.<n>Our study provides a nuanced understanding of how LLMs are shaping the landscape of software development.
arXiv Detail & Related papers (2025-03-06T22:27:05Z) - 7B Fully Open Source Moxin-LLM/VLM -- From Pretraining to GRPO-based Reinforcement Learning Enhancement [41.463611054440435]
Moxin 7B is a fully open-source Large Language Models (LLMs) developed adhering to principles of open science, open source, open data, and open access.<n>We release the pre-training code and configurations, training and fine-tuning datasets, and intermediate and final checkpoints.<n> Experiments show that our models achieve superior performance in various evaluations such as zero-shot evaluation, few-shot evaluation, and CoT evaluation.
arXiv Detail & Related papers (2024-12-08T02:01:46Z) - An Empirical Study on Challenges for LLM Application Developers [28.69628251749012]
We crawl and analyze 29,057 relevant questions from a popular OpenAI developer forum.<n>After manually analyzing 2,364 sampled questions, we construct a taxonomy of challenges faced by LLM developers.
arXiv Detail & Related papers (2024-08-06T05:46:28Z) - LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs [90.04787972295222]
This paper introduces LLM-jp, a cross-organizational project for the research and development of Japanese large language models (LLMs)<n>As of this writing, more than 1,500 participants from academia and industry are working together for this purpose.<n>For the latest activities, visit https://llm-jp.nii.ac.jp/en/.
arXiv Detail & Related papers (2024-07-04T14:33:03Z) - Free to play: UN Trade and Development's experience with developing its own open-source Retrieval Augmented Generation Large Language Model application [0.0]
UNCTAD has explored and developed its own open-source Retrieval Augmented Generation (RAG) LLM application.
RAG makes Large Language Models aware of and more useful for the organization's domain and work.
Three libraries developed to produce the app, nlp_pipeline for document processing and statistical analysis, local_rag_llm for running a local RAG LLM, and streamlit_rag for the user interface, are publicly available on PyPI and GitHub with Dockerfiles.
arXiv Detail & Related papers (2024-06-18T14:23:54Z) - MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series [86.31735321970481]
We open-source MAP-Neo, a bilingual language model with 7B parameters trained from scratch on 4.5T high-quality tokens.
Our MAP-Neo is the first fully open-sourced bilingual LLM with comparable performance compared to existing state-of-the-art LLMs.
arXiv Detail & Related papers (2024-05-29T17:57:16Z) - LLM360: Towards Fully Transparent Open-Source LLMs [89.05970416013403]
The goal of LLM360 is to support open and collaborative AI research by making the end-to-end training process transparent and reproducible by everyone.
As a first step of LLM360, we release two 7B parameter LLMs pre-trained from scratch, Amber and CrystalCoder, including their training code, data, intermediate checkpoints, and analyses.
arXiv Detail & Related papers (2023-12-11T17:39:00Z) - ChatGPT's One-year Anniversary: Are Open-Source Large Language Models
Catching up? [71.12709925152784]
ChatGPT has brought a seismic shift in the entire landscape of AI.
It showed that a model could answer human questions and follow instructions on a broad panel of tasks.
While closed-source LLMs generally outperform their open-source counterparts, the progress on the latter has been rapid.
This has crucial implications not only on research but also on business.
arXiv Detail & Related papers (2023-11-28T17:44:51Z) - H2O Open Ecosystem for State-of-the-art Large Language Models [10.04351591653126]
Large Language Models (LLMs) represent a revolution in AI.
They also pose many significant risks, such as the presence of biased, private, copyrighted or harmful text.
We introduce a complete open-source ecosystem for developing and testing LLMs.
arXiv Detail & Related papers (2023-10-17T09:40:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.