ML-Asset Management: Curation, Discovery, and Utilization
- URL: http://arxiv.org/abs/2509.23577v1
- Date: Sun, 28 Sep 2025 02:14:33 GMT
- Title: ML-Asset Management: Curation, Discovery, and Utilization
- Authors: Mengying Wang, Moming Duan, Yicong Huang, Chen Li, Bingsheng He, Yinghui Wu,
- Abstract summary: Machine learning (ML) assets, such as models, datasets, and metadata, are central to modern ML.<n>Despite their explosive growth in practice, these assets are often underutilized due to fragmented documentation, storage, inconsistent licensing, and lack of unified discovery mechanisms.<n>This tutorial offers a comprehensive overview of ML-asset management activities across its lifecycle, including curation, discovery, and utilization.
- Score: 35.118192476112235
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Machine learning (ML) assets, such as models, datasets, and metadata, are central to modern ML workflows. Despite their explosive growth in practice, these assets are often underutilized due to fragmented documentation, siloed storage, inconsistent licensing, and lack of unified discovery mechanisms, making ML-asset management an urgent challenge. This tutorial offers a comprehensive overview of ML-asset management activities across its lifecycle, including curation, discovery, and utilization. We provide a categorization of ML assets, and major management issues, survey state-of-the-art techniques, and identify emerging opportunities at each stage. We further highlight system-level challenges related to scalability, lineage, and unified indexing. Through live demonstrations of systems, this tutorial equips both researchers and practitioners with actionable insights and practical tools for advancing ML-asset management in real-world and domain-specific settings.
Related papers
- Can LLMs Clean Up Your Mess? A Survey of Application-Ready Data Preparation with LLMs [66.63911043019294]
Data preparation aims to denoise raw datasets, uncover cross-dataset relationships, and extract valuable insights from them.<n>This paper focuses on the use of LLM techniques to prepare data for diverse downstream tasks.<n>We introduce a task-centric taxonomy that organizes the field into three major tasks: data cleaning, standardization, error processing, imputation, data integration, and data enrichment.
arXiv Detail & Related papers (2026-01-22T12:02:45Z) - Agile Management for Machine Learning: A Systematic Mapping Study [1.0396117988046165]
Machine learning (ML)-enabled systems are present in our society, driving significant digital transformations.<n>The dynamic nature of ML development, characterized by experimental cycles and rapid changes in data, poses challenges to traditional project management.<n>This study aims to outline the state of the art in agile management for ML-enabled systems.
arXiv Detail & Related papers (2025-06-25T18:47:08Z) - How do Large Language Models Understand Relevance? A Mechanistic Interpretability Perspective [64.00022624183781]
Large language models (LLMs) can assess relevance and support information retrieval (IR) tasks.<n>We investigate how different LLM modules contribute to relevance judgment through the lens of mechanistic interpretability.
arXiv Detail & Related papers (2025-04-10T16:14:55Z) - Towards Human-Guided, Data-Centric LLM Co-Pilots [53.35493881390917]
CliMB-DC is a human-guided, data-centric framework for machine learning co-pilots.<n>It combines advanced data-centric tools with LLM-driven reasoning to enable robust, context-aware data processing.<n>We show how CliMB-DC can transform uncurated datasets into ML-ready formats.
arXiv Detail & Related papers (2025-01-17T17:51:22Z) - Towards Trustworthy Machine Learning in Production: An Overview of the Robustness in MLOps Approach [0.0]
In recent years, AI researchers and practitioners have introduced principles and guidelines to build systems that make reliable and trustworthy decisions.
In practice, a fundamental challenge arises when the system needs to be operationalized and deployed to evolve and operate in real-life environments continuously.
To address this challenge, Machine Learning Operations (MLOps) have emerged as a potential recipe for standardizing ML solutions in deployment.
arXiv Detail & Related papers (2024-10-28T09:34:08Z) - A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks [74.52259252807191]
Multimodal Large Language Models (MLLMs) address the complexities of real-world applications far beyond the capabilities of single-modality systems.
This paper systematically sorts out the applications of MLLM in multimodal tasks such as natural language, vision, and audio.
arXiv Detail & Related papers (2024-08-02T15:14:53Z) - Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? [54.667202878390526]
Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases.
We introduce LOFT, a benchmark of real-world tasks requiring context up to millions of tokens designed to evaluate LCLMs' performance on in-context retrieval and reasoning.
Our findings reveal LCLMs' surprising ability to rival state-of-the-art retrieval and RAG systems, despite never having been explicitly trained for these tasks.
arXiv Detail & Related papers (2024-06-19T00:28:58Z) - Reasonable Scale Machine Learning with Open-Source Metaflow [2.637746074346334]
We argue that re-purposing existing tools won't solve the current productivity issues.
We introduce Metaflow, an open-source framework for ML projects explicitly designed to boost the productivity of data practitioners.
arXiv Detail & Related papers (2023-03-21T11:28:09Z) - Management of Machine Learning Lifecycle Artifacts: A Survey [7.106986689736826]
We aim to give an overview of systems and platforms which support the management of machine learning lifecycle artifacts.
Based on a systematic review, we derive assessment criteria and apply them to a representative selection of more than 60 systems and platforms.
arXiv Detail & Related papers (2022-10-21T09:23:12Z) - Machine Learning Operations (MLOps): Overview, Definition, and
Architecture [0.0]
The paradigm of Machine Learning Operations (MLOps) addresses this issue.
MLOps is still a vague term and its consequences for researchers and professionals are ambiguous.
We provide an aggregated overview of the necessary components, and roles, as well as the associated architecture and principles.
arXiv Detail & Related papers (2022-05-04T19:38:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.