What's documented in AI? Systematic Analysis of 32K AI Model Cards
- URL: http://arxiv.org/abs/2402.05160v1
- Date: Wed, 7 Feb 2024 18:04:32 GMT
- Title: What's documented in AI? Systematic Analysis of 32K AI Model Cards
- Authors: Weixin Liang, Nazneen Rajani, Xinyu Yang, Ezinwanne Ozoani, Eric Wu,
Yiqun Chen, Daniel Scott Smith, James Zou
- Abstract summary: We conduct a comprehensive analysis of 32,111 AI model documentations on Hugging Face.
Most of the AI models with substantial downloads provide model cards, though the cards have uneven informativeness.
We find that sections addressing environmental impact, limitations, and evaluation exhibit the lowest filled-out rates, while the training section is the most consistently filled-out.
- Score: 40.170354637778345
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid proliferation of AI models has underscored the importance of
thorough documentation, as it enables users to understand, trust, and
effectively utilize these models in various applications. Although developers
are encouraged to produce model cards, it's not clear how much information or
what information these cards contain. In this study, we conduct a comprehensive
analysis of 32,111 AI model documentations on Hugging Face, a leading platform
for distributing and deploying AI models. Our investigation sheds light on the
prevailing model card documentation practices. Most of the AI models with
substantial downloads provide model cards, though the cards have uneven
informativeness. We find that sections addressing environmental impact,
limitations, and evaluation exhibit the lowest filled-out rates, while the
training section is the most consistently filled-out. We analyze the content of
each section to characterize practitioners' priorities. Interestingly, there
are substantial discussions of data, sometimes with equal or even greater
emphasis than the model itself. To evaluate the impact of model cards, we
conducted an intervention study by adding detailed model cards to 42 popular
models which had no or sparse model cards previously. We find that adding model
cards is moderately correlated with an increase weekly download rates. Our
study opens up a new perspective for analyzing community norms and practices
for model documentation through large-scale data science and linguistics
analysis.
Related papers
- Guiding Attention in End-to-End Driving Models [49.762868784033785]
Vision-based end-to-end driving models trained by imitation learning can lead to affordable solutions for autonomous driving.
We study how to guide the attention of these models to improve their driving quality by adding a loss term during training.
In contrast to previous work, our method does not require these salient semantic maps to be available during testing time.
arXiv Detail & Related papers (2024-04-30T23:18:51Z) - Data-efficient Large Vision Models through Sequential Autoregression [58.26179273091461]
We develop an efficient, autoregression-based vision model on a limited dataset.
We demonstrate how this model achieves proficiency in a spectrum of visual tasks spanning both high-level and low-level semantic understanding.
Our empirical evaluations underscore the model's agility in adapting to various tasks, heralding a significant reduction in the parameter footprint.
arXiv Detail & Related papers (2024-02-07T13:41:53Z) - The State of Documentation Practices of Third-party Machine Learning
Models and Datasets [8.494940891363813]
We assess the state of the practice of documenting model cards and dataset cards in one of the largest model stores in use today.
Our findings show that only 21,902 models (39.62%) and 1,925 datasets (28.48%) have documentation.
arXiv Detail & Related papers (2023-12-22T20:45:52Z) - Unlocking Model Insights: A Dataset for Automated Model Card Generation [4.167070553534516]
We introduce a dataset of 500 question-answer pairs for 25 ML models.
We employ annotators to extract the answers from the original paper.
Our experiments with ChatGPT-3.5, LLaMa, and Galactica showcase a significant gap in the understanding of research papers by these LMs.
arXiv Detail & Related papers (2023-09-22T04:46:11Z) - Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering [26.34649731975005]
Retriever-augmented instruction-following models are attractive alternatives to fine-tuned approaches for question answering (QA)
While the model responses tend to be natural and fluent, the additional verbosity makes traditional QA evaluation metrics unreliable for accurately quantifying model performance.
We use both automatic and human evaluation to evaluate these models along two dimensions: 1) how well they satisfy the user's information need (correctness) and 2) whether they produce a response based on the provided knowledge (faithfulness)
arXiv Detail & Related papers (2023-07-31T17:41:00Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - An Empirical Study of Deep Learning Models for Vulnerability Detection [4.243592852049963]
We surveyed and reproduced 9 state-of-the-art deep learning models on 2 widely used vulnerability detection datasets.
We investigated model capabilities, training data, and model interpretation.
Our findings can help better understand model results, provide guidance on preparing training data, and improve the robustness of the models.
arXiv Detail & Related papers (2022-12-15T19:49:34Z) - Synthetic Model Combination: An Instance-wise Approach to Unsupervised
Ensemble Learning [92.89846887298852]
Consider making a prediction over new test data without any opportunity to learn from a training set of labelled data.
Give access to a set of expert models and their predictions alongside some limited information about the dataset used to train them.
arXiv Detail & Related papers (2022-10-11T10:20:31Z) - Aspirations and Practice of Model Documentation: Moving the Needle with
Nudging and Traceability [8.875661788022637]
We propose a set of design guidelines that aim to support the documentation practice for machine learning models.
A prototype tool named DocML follows those guidelines to support model development in computational notebooks.
arXiv Detail & Related papers (2022-04-13T14:39:18Z) - Evaluation Toolkit For Robustness Testing Of Automatic Essay Scoring
Systems [64.4896118325552]
We evaluate the current state-of-the-art AES models using a model adversarial evaluation scheme and associated metrics.
We find that AES models are highly overstable. Even heavy modifications(as much as 25%) with content unrelated to the topic of the questions do not decrease the score produced by the models.
arXiv Detail & Related papers (2020-07-14T03:49:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.