Forecasting Open-Weight AI Model Growth on HuggingFace
- URL: http://arxiv.org/abs/2502.15987v2
- Date: Sat, 15 Mar 2025 21:08:05 GMT
- Title: Forecasting Open-Weight AI Model Growth on HuggingFace
- Authors: Kushal Raj Bhandari, Pin-Yu Chen, Jianxi Gao,
- Abstract summary: Building on parallels with citation dynamics in scientific literature, we propose a framework to quantify how an open-weight model's influence evolves.<n>We adapt the model introduced by Wang et al. for scientific citations, using three key parameters-immediacy, longevity, and relative fitness-to track the cumulative number of fine-tuned models of an open-weight model.<n>Our findings reveal that this citation-style approach can effectively capture the diverse trajectories of open-weight model adoption, with most models fitting well and outliers indicating unique patterns or abrupt jumps in usage.
- Score: 46.348283638884425
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As the open-weight AI landscape continues to proliferate-with model development, significant investment, and user interest-it becomes increasingly important to predict which models will ultimately drive innovation and shape AI ecosystems. Building on parallels with citation dynamics in scientific literature, we propose a framework to quantify how an open-weight model's influence evolves. Specifically, we adapt the model introduced by Wang et al. for scientific citations, using three key parameters-immediacy, longevity, and relative fitness-to track the cumulative number of fine-tuned models of an open-weight model. Our findings reveal that this citation-style approach can effectively capture the diverse trajectories of open-weight model adoption, with most models fitting well and outliers indicating unique patterns or abrupt jumps in usage.
Related papers
- Deep-SITAR: A SITAR-Based Deep Learning Framework for Growth Curve Modeling via Autoencoders [1.274952786182905]
We introduce a supervised deep learning framework based on an autoencoder architecture that integrates a deep neural network (neural network) with a B-spline model to estimate the SITAR model.<n>Deep-SITAR offers a powerful approach to predicting growth trajectories, combining the flexibility and efficiency of deep learning with the interpretability of traditional mixed-effects models.
arXiv Detail & Related papers (2025-05-14T15:55:16Z) - Guiding Time-Varying Generative Models with Natural Gradients on Exponential Family Manifold [5.000311680307273]
We show that the evolution of time-varying generative models can be projected onto an exponential family manifold.<n>We then train the generative model by moving its projection on the manifold according to the natural gradient descent scheme.<n>We propose particle versions of the algorithm, which feature closed-form update rules for any parametric model within the exponential family.
arXiv Detail & Related papers (2025-02-11T15:39:47Z) - A Collaborative Ensemble Framework for CTR Prediction [73.59868761656317]
We propose a novel framework, Collaborative Ensemble Training Network (CETNet), to leverage multiple distinct models.
Unlike naive model scaling, our approach emphasizes diversity and collaboration through collaborative learning.
We validate our framework on three public datasets and a large-scale industrial dataset from Meta.
arXiv Detail & Related papers (2024-11-20T20:38:56Z) - Rethinking Weight-Averaged Model-merging [15.2881959315021]
Weight-averaged model-merging has emerged as a powerful approach in deep learning, capable of enhancing model performance without fine-tuning or retraining.
We investigate this technique from three novel perspectives to provide deeper insights into how and why weight-averaged model-merging works.
Our findings shed light on the "black box" of weight-averaged model-merging, offering valuable insights and practical recommendations.
arXiv Detail & Related papers (2024-11-14T08:02:14Z) - Exploring Model Kinship for Merging Large Language Models [52.01652098827454]
We introduce model kinship, the degree of similarity or relatedness between Large Language Models.
We find that there is a certain relationship between model kinship and the performance gains after model merging.
We propose a new model merging strategy: Top-k Greedy Merging with Model Kinship, which can yield better performance on benchmark datasets.
arXiv Detail & Related papers (2024-10-16T14:29:29Z) - Fine-tuning large language models for domain adaptation: Exploration of training strategies, scaling, model merging and synergistic capabilities [4.389938747401259]
This work explores the effects of fine-tuning strategies on Large Language Models (LLMs) in domains such as materials science and engineering.
We find that the merging of multiple fine-tuned models can lead to the emergence of capabilities that surpass the individual contributions of the parent models.
arXiv Detail & Related papers (2024-09-05T11:49:53Z) - Bridging Model-Based Optimization and Generative Modeling via Conservative Fine-Tuning of Diffusion Models [54.132297393662654]
We introduce a hybrid method that fine-tunes cutting-edge diffusion models by optimizing reward models through RL.
We demonstrate the capability of our approach to outperform the best designs in offline data, leveraging the extrapolation capabilities of reward models.
arXiv Detail & Related papers (2024-05-30T03:57:29Z) - EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods.
EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z) - MGE: A Training-Free and Efficient Model Generation and Enhancement
Scheme [10.48591131837771]
This paper proposes a Training-Free and Efficient Model Generation and Enhancement Scheme (MGE)
It considers two aspects during the model generation process: the distribution of model parameters and model performance.
Experiments result shows that generated models are comparable to models obtained through normal training, and even superior in some cases.
arXiv Detail & Related papers (2024-02-27T13:12:00Z) - Your Autoregressive Generative Model Can be Better If You Treat It as an
Energy-Based One [83.5162421521224]
We propose a unique method termed E-ARM for training autoregressive generative models.
E-ARM takes advantage of a well-designed energy-based learning objective.
We show that E-ARM can be trained efficiently and is capable of alleviating the exposure bias problem.
arXiv Detail & Related papers (2022-06-26T10:58:41Z) - DynamicEmbedding: Extending TensorFlow for Colossal-Scale Applications [0.0]
One of the limitations of deep learning models with sparse features today stems from the predefined nature of their input.
We show that the resulting models are able to perform better and efficiently run at a much larger scale.
arXiv Detail & Related papers (2020-04-17T17:43:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.