Challenges of Using Pre-trained Models: the Practitioners' Perspective
- URL: http://arxiv.org/abs/2404.14710v2
- Date: Wed, 1 May 2024 04:53:55 GMT
- Title: Challenges of Using Pre-trained Models: the Practitioners' Perspective
- Authors: Xin Tan, Taichuan Li, Ruohe Chen, Fang Liu, Li Zhang,
- Abstract summary: We analyze the popularity and difficulty trends of PTM-related questions on Stack Overflow.
We find that PTM-related questions are becoming more and more popular over time.
This observation emphasizes the significant difficulty and complexity associated with the practical application of PTMs.
- Score: 16.042355796766124
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The challenges associated with using pre-trained models (PTMs) have not been specifically investigated, which hampers their effective utilization. To address this knowledge gap, we collected and analyzed a dataset of 5,896 PTM-related questions on Stack Overflow. We first analyze the popularity and difficulty trends of PTM-related questions. We find that PTM-related questions are becoming more and more popular over time. However, it is noteworthy that PTM-related questions not only have a lower response rate but also exhibit a longer response time compared to many well-researched topics in software engineering. This observation emphasizes the significant difficulty and complexity associated with the practical application of PTMs. To delve into the specific challenges, we manually annotate 430 PTM-related questions, categorizing them into a hierarchical taxonomy of 42 codes (i.e., leaf nodes) and three categories. This taxonomy encompasses many PTM prominent challenges such as fine-tuning, output understanding, and prompt customization, which reflects the gaps between current techniques and practical needs. We discuss the implications of our study for PTM practitioners, vendors, and educators, and suggest possible directions and solutions for future research.
Related papers
- Optimizing Language Model's Reasoning Abilities with Weak Supervision [48.60598455782159]
We present textscPuzzleBen, a weakly supervised benchmark that comprises 25,147 complex questions, answers, and human-generated rationales.
A unique aspect of our dataset is the inclusion of 10,000 unannotated questions, enabling us to explore utilizing fewer supersized data to boost LLMs' inference capabilities.
arXiv Detail & Related papers (2024-05-07T07:39:15Z) - Qsnail: A Questionnaire Dataset for Sequential Question Generation [76.616068047362]
We present the first dataset specifically constructed for the questionnaire generation task, which comprises 13,168 human-written questionnaires.
We conduct experiments on Qsnail, and the results reveal that retrieval models and traditional generative models do not fully align with the given research topic and intents.
Despite enhancements through the chain-of-thought prompt and finetuning, questionnaires generated by language models still fall short of human-written questionnaires.
arXiv Detail & Related papers (2024-02-22T04:14:10Z) - TAT-LLM: A Specialized Language Model for Discrete Reasoning over
Tabular and Textual Data [77.66158066013924]
We consider harnessing the amazing power of language models (LLMs) to solve our task.
We develop a TAT-LLM language model by fine-tuning LLaMA 2 with the training data generated automatically from existing expert-annotated datasets.
arXiv Detail & Related papers (2024-01-24T04:28:50Z) - Deep Learning Model Reuse in the HuggingFace Community: Challenges,
Benefit and Trends [12.645960268553686]
The ubiquity of large-scale Pre-Trained Models (PTMs) is on the rise, sparking interest in model hubs and dedicated platforms for hosting PTMs.
We present a taxonomy of the challenges and benefits associated with PTM reuse within this community.
Our findings highlight prevalent challenges such as limited guidance for beginner users, struggles with model output comprehensibility in training or inference, and a lack of model understanding.
arXiv Detail & Related papers (2024-01-24T01:50:29Z) - A Comprehensive Evaluation of Parameter-Efficient Fine-Tuning on
Software Engineering Tasks [29.88525311985907]
Pre-trained models (PTMs) have achieved great success in various Software Engineering (SE) downstream tasks.
A widely used solution is parameter-efficient fine-tuning (PEFT), which freezes PTMs while introducing extra parameters.
This paper aims to evaluate the effectiveness of five PEFT methods on eight PTMs and four SE downstream tasks.
arXiv Detail & Related papers (2023-12-25T05:25:39Z) - The Shifted and The Overlooked: A Task-oriented Investigation of
User-GPT Interactions [114.67699010359637]
We analyze a large-scale collection of real user queries to GPT.
We find that tasks such as design'' and planning'' are prevalent in user interactions but are largely neglected or different from traditional NLP benchmarks.
arXiv Detail & Related papers (2023-10-19T02:12:17Z) - Naming Practices of Pre-Trained Models in Hugging Face [4.956536094440504]
Pre-Trained Models (PTMs) are used in computer systems to adapt for quality or performance prior to deployment.
Researchers publish PTMs, which engineers adapt for quality or performance prior to deployment.
Prior research has reported that model names are not always well chosen - and are sometimes erroneous.
In this paper, we frame and conduct the first empirical investigation of PTM naming practices in the Hugging Face PTM registry.
arXiv Detail & Related papers (2023-10-02T21:13:32Z) - A Survey on Time-Series Pre-Trained Models [34.98332094625603]
Time-Series Mining (TSM) shows great potential in practical applications.
Deep learning models that rely on massive labeled data have been utilized for TSM successfully.
Recently, Pre-Trained Models have gradually attracted attention in the time series domain.
arXiv Detail & Related papers (2023-05-18T05:27:46Z) - Pre-Trained Models: Past, Present and Future [126.21572378910746]
Large-scale pre-trained models (PTMs) have recently achieved great success and become a milestone in the field of artificial intelligence (AI)
By storing knowledge into huge parameters and fine-tuning on specific tasks, the rich knowledge implicitly encoded in huge parameters can benefit a variety of downstream tasks.
It is now the consensus of the AI community to adopt PTMs as backbone for downstream tasks rather than learning models from scratch.
arXiv Detail & Related papers (2021-06-14T02:40:32Z) - Pre-trained Models for Natural Language Processing: A Survey [75.95500552357429]
The emergence of pre-trained models (PTMs) has brought natural language processing (NLP) to a new era.
This survey is purposed to be a hands-on guide for understanding, using, and developing PTMs for various NLP tasks.
arXiv Detail & Related papers (2020-03-18T15:22:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.