Deep Learning Model Reuse in the HuggingFace Community: Challenges,
Benefit and Trends
- URL: http://arxiv.org/abs/2401.13177v1
- Date: Wed, 24 Jan 2024 01:50:29 GMT
- Title: Deep Learning Model Reuse in the HuggingFace Community: Challenges,
Benefit and Trends
- Authors: Mina Taraghi, Gianolli Dorcelus, Armstrong Foundjem, Florian Tambon,
Foutse Khomh
- Abstract summary: The ubiquity of large-scale Pre-Trained Models (PTMs) is on the rise, sparking interest in model hubs and dedicated platforms for hosting PTMs.
We present a taxonomy of the challenges and benefits associated with PTM reuse within this community.
Our findings highlight prevalent challenges such as limited guidance for beginner users, struggles with model output comprehensibility in training or inference, and a lack of model understanding.
- Score: 12.645960268553686
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The ubiquity of large-scale Pre-Trained Models (PTMs) is on the rise,
sparking interest in model hubs, and dedicated platforms for hosting PTMs.
Despite this trend, a comprehensive exploration of the challenges that users
encounter and how the community leverages PTMs remains lacking. To address this
gap, we conducted an extensive mixed-methods empirical study by focusing on
discussion forums and the model hub of HuggingFace, the largest public model
hub. Based on our qualitative analysis, we present a taxonomy of the challenges
and benefits associated with PTM reuse within this community. We then conduct a
quantitative study to track model-type trends and model documentation evolution
over time. Our findings highlight prevalent challenges such as limited guidance
for beginner users, struggles with model output comprehensibility in training
or inference, and a lack of model understanding. We also identified interesting
trends among models where some models maintain high upload rates despite a
decline in topics related to them. Additionally, we found that despite the
introduction of model documentation tools, its quantity has not increased over
time, leading to difficulties in model comprehension and selection among users.
Our study sheds light on new challenges in reusing PTMs that were not reported
before and we provide recommendations for various stakeholders involved in PTM
reuse.
Related papers
- Self-Supervised Learning for Time Series: A Review & Critique of FITS [0.0]
Recently proposed model, FITS, claims competitive performance with significantly reduced parameter counts.
By training a one-layer neural network in the complex frequency domain, we are able to replicate these results.
Our experiments reveal that FITS especially excels at capturing periodic and seasonal patterns, but struggles with trending, non-periodic, or random-resembling behavior.
arXiv Detail & Related papers (2024-10-23T23:03:09Z) - Investigating the Impact of Model Complexity in Large Language Models [3.7919508292745676]
Large Language Models (LLMs) based on the pre-trained fine-tuning paradigm have become pivotal in solving natural language processing tasks.
In this paper, we focus on autoregressive LLMs and propose to employ Hidden Markov Models (HMMs) to model them.
arXiv Detail & Related papers (2024-10-01T13:53:44Z) - SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction.
SMILE allows for the upscaling of source models into an MoE model without extra data or further training.
We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z) - Continual Learning with Pre-Trained Models: A Survey [61.97613090666247]
Continual Learning aims to overcome the catastrophic forgetting of former knowledge when learning new ones.
This paper presents a comprehensive survey of the latest advancements in PTM-based CL.
arXiv Detail & Related papers (2024-01-29T18:27:52Z) - Comprehensive Exploration of Synthetic Data Generation: A Survey [4.485401662312072]
This work surveys 417 Synthetic Data Generation models over the last decade.
The findings reveal increased model performance and complexity, with neural network-based approaches prevailing.
Computer vision dominates, with GANs as primary generative models, while diffusion models, transformers, and RNNs compete.
arXiv Detail & Related papers (2024-01-04T20:23:51Z) - ZhiJian: A Unifying and Rapidly Deployable Toolbox for Pre-trained Model
Reuse [59.500060790983994]
This paper introduces ZhiJian, a comprehensive and user-friendly toolbox for model reuse, utilizing the PyTorch backend.
ZhiJian presents a novel paradigm that unifies diverse perspectives on model reuse, encompassing target architecture construction with PTM, tuning target model with PTM, and PTM-based inference.
arXiv Detail & Related papers (2023-08-17T19:12:13Z) - Scaling Vision-Language Models with Sparse Mixture of Experts [128.0882767889029]
We show that mixture-of-experts (MoE) techniques can achieve state-of-the-art performance on a range of benchmarks over dense models of equivalent computational cost.
Our research offers valuable insights into stabilizing the training of MoE models, understanding the impact of MoE on model interpretability, and balancing the trade-offs between compute performance when scaling vision-language models.
arXiv Detail & Related papers (2023-03-13T16:00:31Z) - Are Neural Topic Models Broken? [81.15470302729638]
We study the relationship between automated and human evaluation of topic models.
We find that neural topic models fare worse in both respects compared to an established classical method.
arXiv Detail & Related papers (2022-10-28T14:38:50Z) - Leveraging Multiple Relations for Fashion Trend Forecasting Based on
Social Media [72.06420633156479]
We propose an improved model named Relation Enhanced Attention Recurrent (REAR) network.
Compared to KERN, the REAR model leverages not only the relations among fashion elements but also those among user groups.
To further improve the performance of long-range trend forecasting, the REAR method devises a sliding temporal attention mechanism.
arXiv Detail & Related papers (2021-05-07T14:52:03Z) - DoubleEnsemble: A New Ensemble Method Based on Sample Reweighting and
Feature Selection for Financial Data Analysis [22.035287788330663]
We propose DoubleEnsemble, an ensemble framework leveraging learning trajectory based sample reweighting and shuffling based feature selection.
Our model is applicable to a wide range of base models, capable of extracting complex patterns, while mitigating the overfitting and instability issues for financial market prediction.
arXiv Detail & Related papers (2020-10-03T02:57:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.