The State of Documentation Practices of Third-party Machine Learning
Models and Datasets
- URL: http://arxiv.org/abs/2312.15058v1
- Date: Fri, 22 Dec 2023 20:45:52 GMT
- Title: The State of Documentation Practices of Third-party Machine Learning
Models and Datasets
- Authors: Ernesto Lang Oreamuno, Rohan Faiyaz Khan, Abdul Ali Bangash, Catherine
Stinson, Bram Adams
- Abstract summary: We assess the state of the practice of documenting model cards and dataset cards in one of the largest model stores in use today.
Our findings show that only 21,902 models (39.62%) and 1,925 datasets (28.48%) have documentation.
- Score: 8.494940891363813
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Model stores offer third-party ML models and datasets for easy project
integration, minimizing coding efforts. One might hope to find detailed
specifications of these models and datasets in the documentation, leveraging
documentation standards such as model and dataset cards. In this study, we use
statistical analysis and hybrid card sorting to assess the state of the
practice of documenting model cards and dataset cards in one of the largest
model stores in use today--Hugging Face (HF). Our findings show that only
21,902 models (39.62\%) and 1,925 datasets (28.48\%) have documentation.
Furthermore, we observe inconsistency in ethics and transparency-related
documentation for ML models and datasets.
Related papers
- LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers Content [62.816876067499415]
We propose LiveXiv: a scalable evolving live benchmark based on scientific ArXiv papers.
LiveXiv accesses domain-specific manuscripts at any given timestamp and proposes to automatically generate visual question-answer pairs.
We benchmark multiple open and proprietary Large Multi-modal Models (LMMs) on the first version of our benchmark, showing its challenging nature and exposing the models true abilities.
arXiv Detail & Related papers (2024-10-14T17:51:23Z) - Diffusion Models as Data Mining Tools [87.77999285241219]
This paper demonstrates how to use generative models trained for image synthesis as tools for visual data mining.
We show that after finetuning conditional diffusion models to synthesize images from a specific dataset, we can use these models to define a typicality measure.
This measure assesses how typical visual elements are for different data labels, such as geographic location, time stamps, semantic labels, or even the presence of a disease.
arXiv Detail & Related papers (2024-07-20T17:14:31Z) - RanLayNet: A Dataset for Document Layout Detection used for Domain Adaptation and Generalization [36.973388673687815]
RanLayNet is a synthetic document dataset enriched with automatically assigned labels.
We show that a deep layout identification model trained on our dataset exhibits enhanced performance compared to a model trained solely on actual documents.
arXiv Detail & Related papers (2024-04-15T07:50:15Z) - What's documented in AI? Systematic Analysis of 32K AI Model Cards [40.170354637778345]
We conduct a comprehensive analysis of 32,111 AI model documentations on Hugging Face.
Most of the AI models with substantial downloads provide model cards, though the cards have uneven informativeness.
We find that sections addressing environmental impact, limitations, and evaluation exhibit the lowest filled-out rates, while the training section is the most consistently filled-out.
arXiv Detail & Related papers (2024-02-07T18:04:32Z) - GSAP-NER: A Novel Task, Corpus, and Baseline for Scholarly Entity
Extraction Focused on Machine Learning Models and Datasets [3.9169112083667073]
In academic writing, references to machine learning models and datasets are fundamental components.
Existing ground truth datasets do not treat fine-grained types like ML model and model architecture as separate entity types.
We release a corpus of 100 manually annotated full-text scientific publications and a first baseline model for 10 entity types centered around ML models and datasets.
arXiv Detail & Related papers (2023-11-16T12:43:02Z) - Unlocking Model Insights: A Dataset for Automated Model Card Generation [4.167070553534516]
We introduce a dataset of 500 question-answer pairs for 25 ML models.
We employ annotators to extract the answers from the original paper.
Our experiments with ChatGPT-3.5, LLaMa, and Galactica showcase a significant gap in the understanding of research papers by these LMs.
arXiv Detail & Related papers (2023-09-22T04:46:11Z) - Metadata Representations for Queryable ML Model Zoos [73.24799582702326]
Machine learning (ML) practitioners and organizations are building model zoos of pre-trained models, containing metadata describing properties of the models.
The metatada is currently not standardised; its expressivity is limited; and there is no way to store and query it.
In this paper, we advocate for standardized ML model meta-data representation and management, proposing a toolkit supported to help practitioners manage and query that metadata.
arXiv Detail & Related papers (2022-07-19T15:04:14Z) - Aspirations and Practice of Model Documentation: Moving the Needle with
Nudging and Traceability [8.875661788022637]
We propose a set of design guidelines that aim to support the documentation practice for machine learning models.
A prototype tool named DocML follows those guidelines to support model development in computational notebooks.
arXiv Detail & Related papers (2022-04-13T14:39:18Z) - Comparing Test Sets with Item Response Theory [53.755064720563]
We evaluate 29 datasets using predictions from 18 pretrained Transformer models on individual test examples.
We find that Quoref, HellaSwag, and MC-TACO are best suited for distinguishing among state-of-the-art models.
We also observe span selection task format, which is used for QA datasets like QAMR or SQuAD2.0, is effective in differentiating between strong and weak models.
arXiv Detail & Related papers (2021-06-01T22:33:53Z) - When Can Models Learn From Explanations? A Formal Framework for
Understanding the Roles of Explanation Data [84.87772675171412]
We study the circumstances under which explanations of individual data points can improve modeling performance.
We make use of three existing datasets with explanations: e-SNLI, TACRED, SemEval.
arXiv Detail & Related papers (2021-02-03T18:57:08Z) - Dataset Cartography: Mapping and Diagnosing Datasets with Training
Dynamics [118.75207687144817]
We introduce Data Maps, a model-based tool to characterize and diagnose datasets.
We leverage a largely ignored source of information: the behavior of the model on individual instances during training.
Our results indicate that a shift in focus from quantity to quality of data could lead to robust models and improved out-of-distribution generalization.
arXiv Detail & Related papers (2020-09-22T20:19:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.