Economies of Open Intelligence: Tracing Power & Participation in the Model Ecosystem
- URL: http://arxiv.org/abs/2512.03073v1
- Date: Thu, 27 Nov 2025 12:50:25 GMT
- Title: Economies of Open Intelligence: Tracing Power & Participation in the Model Ecosystem
- Authors: Shayne Longpre, Christopher Akiki, Campbell Lund, Atharva Kulkarni, Emily Chen, Irene Solaiman, Avijit Ghosh, Yacine Jernite, Lucie-Aimée Kaffee,
- Abstract summary: Hugging Face Model Hub has been the primary global platform for sharing open weight AI models.<n>Our analysis spans 851,000 models, over 200 aggregated attributes per model, and 2.2B downloads.
- Score: 21.595922367237815
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Since 2019, the Hugging Face Model Hub has been the primary global platform for sharing open weight AI models. By releasing a dataset of the complete history of weekly model downloads (June 2020-August 2025) alongside model metadata, we provide the most rigorous examination to-date of concentration dynamics and evolving characteristics in the open model economy. Our analysis spans 851,000 models, over 200 aggregated attributes per model, and 2.2B downloads. We document a fundamental rebalancing of economic power: US open-weight industry dominance by Google, Meta, and OpenAI has declined sharply in favor of unaffiliated developers, community organizations, and, as of 2025, Chinese industry, with DeepSeek and Qwen models potentially heralding a new consolidation of market power. We identify statistically significant shifts in model properties, a 17X increase in average model size, rapid growth in multimodal generation (3.4X), quantization (5X), and mixture-of-experts architectures (7X), alongside concerning declines in data transparency, with open weights models surpassing truly open source models for the first time in 2025. We expose a new layer of developer intermediaries that has emerged, focused on quantizing and adapting base models for both efficiency and artistic expression. To enable continued research and oversight, we release the complete dataset with an interactive dashboard for real-time monitoring of concentration dynamics and evolving properties in the open model economy.
Related papers
- Affordances Enable Partial World Modeling with LLMs [68.52975612311575]
We show that agents achieving task-agnostic, language-conditioned intents possess predictive partial-world models informed by affordances.<n>In the multi-task setting, we introduce distribution-robust affordances and show that partial models can be extracted to significantly improve search efficiency.
arXiv Detail & Related papers (2026-02-11T00:25:25Z) - Multi-Location Software Model Completion [6.674306827529775]
We propose a novel global embedding-based next focus predictor, NextFocus.<n>NextFocus is capable of multi-location model completion for the first time.<n>It achieves an average Precision@k score of 0.98 for $k leq 10$, significantly outperforming the three baseline approaches.
arXiv Detail & Related papers (2026-01-20T12:19:34Z) - Revisiting Multi-Agent World Modeling from a Diffusion-Inspired Perspective [54.77404771454794]
We develop a flexible and robust world model for Multi-Agent Reinforcement Learning (MARL) using diffusion models.<n>Our method, Diffusion-Inspired Multi-Agent world model (DIMA), achieves state-of-the-art performance across multiple multi-agent control benchmarks.
arXiv Detail & Related papers (2025-05-27T09:11:38Z) - Forecasting Open-Weight AI Model Growth on HuggingFace [46.348283638884425]
Building on parallels with citation dynamics in scientific literature, we propose a framework to quantify how an open-weight model's influence evolves.<n>We adapt the model introduced by Wang et al. for scientific citations, using three key parameters-immediacy, longevity, and relative fitness-to track the cumulative number of fine-tuned models of an open-weight model.<n>Our findings reveal that this citation-style approach can effectively capture the diverse trajectories of open-weight model adoption, with most models fitting well and outliers indicating unique patterns or abrupt jumps in usage.
arXiv Detail & Related papers (2025-02-21T22:52:19Z) - A Collaborative Ensemble Framework for CTR Prediction [73.59868761656317]
We propose a novel framework, Collaborative Ensemble Training Network (CETNet), to leverage multiple distinct models.
Unlike naive model scaling, our approach emphasizes diversity and collaboration through collaborative learning.
We validate our framework on three public datasets and a large-scale industrial dataset from Meta.
arXiv Detail & Related papers (2024-11-20T20:38:56Z) - Exploring Model Kinship for Merging Large Language Models [73.98345036483299]
We study model evolution through iterative merging, drawing an analogy to biological evolution.<n>We show that model kinship is closely linked to the performance improvements achieved by merging.<n>We propose a new model merging strategy: Top-k Greedy Merging with Model Kinship.
arXiv Detail & Related papers (2024-10-16T14:29:29Z) - EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models [36.576853882830896]
We introduce EvolveDirector to train a text-to-image generation model comparable to advanced models using publicly available resources.
This framework interacts with advanced models through their public APIs to obtain text-image data pairs to train a base model.
We leverage pre-trained large vision-language models (VLMs) to guide the evolution of the base model.
arXiv Detail & Related papers (2024-10-09T17:52:28Z) - Data-Juicer Sandbox: A Feedback-Driven Suite for Multimodal Data-Model Co-development [67.55944651679864]
We present a new sandbox suite tailored for integrated data-model co-development.<n>This sandbox provides a feedback-driven experimental platform, enabling cost-effective and guided refinement of both data and models.
arXiv Detail & Related papers (2024-07-16T14:40:07Z) - On the Stability of Iterative Retraining of Generative Models on their own Data [56.153542044045224]
We study the impact of training generative models on mixed datasets.
We first prove the stability of iterative training under the condition that the initial generative models approximate the data distribution well enough.
We empirically validate our theory on both synthetic and natural images by iteratively training normalizing flows and state-of-the-art diffusion models.
arXiv Detail & Related papers (2023-09-30T16:41:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.