Related papers: Metadata Representations for Queryable ML Model Zoos

Metadata Representations for Queryable ML Model Zoos

URL: http://arxiv.org/abs/2207.09315v1
Date: Tue, 19 Jul 2022 15:04:14 GMT
Title: Metadata Representations for Queryable ML Model Zoos
Authors: Ziyu Li, Rihan Hai, Alessandro Bozzon and Asterios Katsifodimos
Abstract summary: Machine learning (ML) practitioners and organizations are building model zoos of pre-trained models, containing metadata describing properties of the models. The metatada is currently not standardised; its expressivity is limited; and there is no way to store and query it. In this paper, we advocate for standardized ML model meta-data representation and management, proposing a toolkit supported to help practitioners manage and query that metadata.
Score: 73.24799582702326
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Machine learning (ML) practitioners and organizations are building model zoos of pre-trained models, containing metadata describing properties of the ML models and datasets that are useful for reporting, auditing, reproducibility, and interpretability purposes. The metatada is currently not standardised; its expressivity is limited; and there is no interoperable way to store and query it. Consequently, model search, reuse, comparison, and composition are hindered. In this paper, we advocate for standardized ML model meta-data representation and management, proposing a toolkit supported to help practitioners manage and query that metadata.

Related papers

Beyond Quacking: Deep Integration of Language Models and RAG into DuckDB [44.057784044659726]
Large language models (LLMs) have made it easier to prototype such retrieval and reasoning data pipelines. This often involves orchestrating data systems, managing data movement, and handling low-level details. We introduce FlockMTL: an extension for abstractions that integrates deeply LLM capabilities and retrieval-augmented generation.
arXiv Detail & Related papers (2025-04-01T19:48:17Z)
Augmented Knowledge Graph Querying leveraging LLMs [2.5311562666866494]
We introduce SparqLLM, a framework that enhances the querying of Knowledge Graphs (KGs) SparqLLM executes the Extract, Transform, and Load (ETL) pipeline to construct KGs from raw data. It also features a natural language interface powered by Large Language Models (LLMs) to enable automatic SPARQL query generation.
arXiv Detail & Related papers (2025-02-03T12:18:39Z)
Harmonizing Metadata of Language Resources for Enhanced Querying and Accessibility [0.0]
This paper addresses the harmonization of metadata from diverse repositories of language resources (LRs) Our methodology supports text-based search, faceted browsing, and advanced SPARQL queries through Linghub, a newly developed portal. The study highlights significant metadata issues and advocates for adherence to open vocabularies and standards to enhance metadata harmonization.
arXiv Detail & Related papers (2025-01-09T22:48:43Z)
Towards Agentic Schema Refinement [3.7173623393215287]
We propose a semantic layer in-between the database and the user as a set of small and easy-to-interpret database views. Our approach paves the way for LLM-powered exploration of unwieldy databases.
arXiv Detail & Related papers (2024-11-25T19:57:16Z)
Matchmaker: Self-Improving Large Language Model Programs for Schema Matching [60.23571456538149]
We propose a compositional language model program for schema matching, comprised of candidate generation, refinement and confidence scoring. Matchmaker self-improves in a zero-shot manner without the need for labeled demonstrations. Empirically, we demonstrate on real-world medical schema matching benchmarks that Matchmaker outperforms previous ML-based approaches.
arXiv Detail & Related papers (2024-10-31T16:34:03Z)
AttributionScanner: A Visual Analytics System for Model Validation with Metadata-Free Slice Finding [29.07617945233152]
Data slice finding is an emerging technique for validating machine learning (ML) models by identifying and analyzing subgroups in a dataset that exhibit poor performance. This approach faces significant challenges, including the laborious and costly requirement for additional metadata. We introduce AttributionScanner, an innovative human-in-the-loop Visual Analytics (VA) system, designed for metadata-free data slice finding. Our system identifies interpretable data slices that involve common model behaviors and visualizes these patterns through an Attribution Mosaic design.
arXiv Detail & Related papers (2024-01-12T09:17:32Z)
GSAP-NER: A Novel Task, Corpus, and Baseline for Scholarly Entity Extraction Focused on Machine Learning Models and Datasets [3.9169112083667073]
In academic writing, references to machine learning models and datasets are fundamental components. Existing ground truth datasets do not treat fine-grained types like ML model and model architecture as separate entity types. We release a corpus of 100 manually annotated full-text scientific publications and a first baseline model for 10 entity types centered around ML models and datasets.
arXiv Detail & Related papers (2023-11-16T12:43:02Z)
Utilising a Large Language Model to Annotate Subject Metadata: A Case Study in an Australian National Research Data Catalogue [18.325675189960833]
In support of open and reproducible research, there has been a rapidly increasing number of datasets made available for research. As the availability of datasets increases, it becomes more important to have quality metadata for discovering and reusing them. This paper proposes to leverage large language models (LLMs) for cost-effective annotation of subject metadata through the LLM-based in-context learning.
arXiv Detail & Related papers (2023-10-17T14:52:33Z)
Adapting Large Language Models for Content Moderation: Pitfalls in Data Engineering and Supervised Fine-tuning [79.53130089003986]
Large Language Models (LLMs) have become a feasible solution for handling tasks in various domains. In this paper, we introduce how to fine-tune a LLM model that can be privately deployed for content moderation.
arXiv Detail & Related papers (2023-10-05T09:09:44Z)
Interpretable Medical Diagnostics with Structured Data Extraction by Large Language Models [59.89454513692417]
Tabular data is often hidden in text, particularly in medical diagnostic reports. We propose a novel, simple, and effective methodology for extracting structured tabular data from textual medical reports, called TEMED-LLM. We demonstrate that our approach significantly outperforms state-of-the-art text classification models in medical diagnostics.
arXiv Detail & Related papers (2023-06-08T09:12:28Z)
DAC-MR: Data Augmentation Consistency Based Meta-Regularization for Meta-Learning [55.733193075728096]
We propose a meta-knowledge informed meta-learning (MKIML) framework to improve meta-learning. We preliminarily integrate meta-knowledge into meta-objective via using an appropriate meta-regularization (MR) objective. The proposed DAC-MR is hopeful to learn well-performing meta-models from training tasks with noisy, sparse or unavailable meta-data.
arXiv Detail & Related papers (2023-05-13T11:01:47Z)
Improving Meta-learning for Low-resource Text Classification and Generation via Memory Imitation [87.98063273826702]
We propose a memory imitation meta-learning (MemIML) method that enhances the model's reliance on support sets for task adaptation. A theoretical analysis is provided to prove the effectiveness of our method.
arXiv Detail & Related papers (2022-03-22T12:41:55Z)
Automated Metadata Harmonization Using Entity Resolution & Contextual Embedding [0.0]
We demonstrate automation of this step with the help of Cogntive Database's Db2Vec embedding approach. Apart from matching schemas, we demonstrate that it can also infer the correct ontological structure of the target data model.
arXiv Detail & Related papers (2020-10-17T02:14:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.