Metadata Representations for Queryable ML Model Zoos
- URL: http://arxiv.org/abs/2207.09315v1
- Date: Tue, 19 Jul 2022 15:04:14 GMT
- Title: Metadata Representations for Queryable ML Model Zoos
- Authors: Ziyu Li, Rihan Hai, Alessandro Bozzon and Asterios Katsifodimos
- Abstract summary: Machine learning (ML) practitioners and organizations are building model zoos of pre-trained models, containing metadata describing properties of the models.
The metatada is currently not standardised; its expressivity is limited; and there is no way to store and query it.
In this paper, we advocate for standardized ML model meta-data representation and management, proposing a toolkit supported to help practitioners manage and query that metadata.
- Score: 73.24799582702326
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning (ML) practitioners and organizations are building model zoos
of pre-trained models, containing metadata describing properties of the ML
models and datasets that are useful for reporting, auditing, reproducibility,
and interpretability purposes. The metatada is currently not standardised; its
expressivity is limited; and there is no interoperable way to store and query
it. Consequently, model search, reuse, comparison, and composition are
hindered. In this paper, we advocate for standardized ML model meta-data
representation and management, proposing a toolkit supported to help
practitioners manage and query that metadata.
Related papers
- Matchmaker: Self-Improving Large Language Model Programs for Schema Matching [60.23571456538149]
We propose a compositional language model program for schema matching, comprised of candidate generation, refinement and confidence scoring.
Matchmaker self-improves in a zero-shot manner without the need for labeled demonstrations.
Empirically, we demonstrate on real-world medical schema matching benchmarks that Matchmaker outperforms previous ML-based approaches.
arXiv Detail & Related papers (2024-10-31T16:34:03Z) - AttributionScanner: A Visual Analytics System for Model Validation with Metadata-Free Slice Finding [29.07617945233152]
Data slice finding is an emerging technique for validating machine learning (ML) models by identifying and analyzing subgroups in a dataset that exhibit poor performance.
This approach faces significant challenges, including the laborious and costly requirement for additional metadata.
We introduce AttributionScanner, an innovative human-in-the-loop Visual Analytics (VA) system, designed for metadata-free data slice finding.
Our system identifies interpretable data slices that involve common model behaviors and visualizes these patterns through an Attribution Mosaic design.
arXiv Detail & Related papers (2024-01-12T09:17:32Z) - GSAP-NER: A Novel Task, Corpus, and Baseline for Scholarly Entity
Extraction Focused on Machine Learning Models and Datasets [3.9169112083667073]
In academic writing, references to machine learning models and datasets are fundamental components.
Existing ground truth datasets do not treat fine-grained types like ML model and model architecture as separate entity types.
We release a corpus of 100 manually annotated full-text scientific publications and a first baseline model for 10 entity types centered around ML models and datasets.
arXiv Detail & Related papers (2023-11-16T12:43:02Z) - Utilising a Large Language Model to Annotate Subject Metadata: A Case
Study in an Australian National Research Data Catalogue [18.325675189960833]
In support of open and reproducible research, there has been a rapidly increasing number of datasets made available for research.
As the availability of datasets increases, it becomes more important to have quality metadata for discovering and reusing them.
This paper proposes to leverage large language models (LLMs) for cost-effective annotation of subject metadata through the LLM-based in-context learning.
arXiv Detail & Related papers (2023-10-17T14:52:33Z) - Adapting Large Language Models for Content Moderation: Pitfalls in Data
Engineering and Supervised Fine-tuning [79.53130089003986]
Large Language Models (LLMs) have become a feasible solution for handling tasks in various domains.
In this paper, we introduce how to fine-tune a LLM model that can be privately deployed for content moderation.
arXiv Detail & Related papers (2023-10-05T09:09:44Z) - Interpretable Medical Diagnostics with Structured Data Extraction by
Large Language Models [59.89454513692417]
Tabular data is often hidden in text, particularly in medical diagnostic reports.
We propose a novel, simple, and effective methodology for extracting structured tabular data from textual medical reports, called TEMED-LLM.
We demonstrate that our approach significantly outperforms state-of-the-art text classification models in medical diagnostics.
arXiv Detail & Related papers (2023-06-08T09:12:28Z) - DAC-MR: Data Augmentation Consistency Based Meta-Regularization for
Meta-Learning [55.733193075728096]
We propose a meta-knowledge informed meta-learning (MKIML) framework to improve meta-learning.
We preliminarily integrate meta-knowledge into meta-objective via using an appropriate meta-regularization (MR) objective.
The proposed DAC-MR is hopeful to learn well-performing meta-models from training tasks with noisy, sparse or unavailable meta-data.
arXiv Detail & Related papers (2023-05-13T11:01:47Z) - Improving Meta-learning for Low-resource Text Classification and
Generation via Memory Imitation [87.98063273826702]
We propose a memory imitation meta-learning (MemIML) method that enhances the model's reliance on support sets for task adaptation.
A theoretical analysis is provided to prove the effectiveness of our method.
arXiv Detail & Related papers (2022-03-22T12:41:55Z) - Automated Metadata Harmonization Using Entity Resolution & Contextual
Embedding [0.0]
We demonstrate automation of this step with the help of Cogntive Database's Db2Vec embedding approach.
Apart from matching schemas, we demonstrate that it can also infer the correct ontological structure of the target data model.
arXiv Detail & Related papers (2020-10-17T02:14:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.