Related papers: Modeling Quality and Machine Learning Pipelines through Extended Feature Models

Modeling Quality and Machine Learning Pipelines through Extended Feature Models

URL: http://arxiv.org/abs/2207.07528v1
Date: Fri, 15 Jul 2022 15:20:28 GMT
Title: Modeling Quality and Machine Learning Pipelines through Extended Feature Models
Authors: Giordano d'Aloisio, Antinisca Di Marco and Giovanni Stilo (University of L'Aquila)
Abstract summary: We propose a new engineering approach for quality ML pipeline by properly extending the Feature Models meta-model. The presented approach allows to model ML pipelines, their quality requirements (on the whole pipeline and on single phases) and quality characteristics of algorithms used to implement each pipeline phase.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The recently increased complexity of Machine Learning (ML) methods, led to the necessity to lighten both the research and industry development processes. ML pipelines have become an essential tool for experts of many domains, data scientists and researchers, allowing them to easily put together several ML models to cover the full analytic process starting from raw datasets. Over the years, several solutions have been proposed to automate the building of ML pipelines, most of them focused on semantic aspects and characteristics of the input dataset. However, an approach taking into account the new quality concerns needed by ML systems (like fairness, interpretability, privacy, etc.) is still missing. In this paper, we first identify, from the literature, key quality attributes of ML systems. Further, we propose a new engineering approach for quality ML pipeline by properly extending the Feature Models meta-model. The presented approach allows to model ML pipelines, their quality requirements (on the whole pipeline and on single phases), and quality characteristics of algorithms used to implement each pipeline phase. Finally, we demonstrate the expressiveness of our model considering the classification problem.

Related papers

Purifying, Labeling, and Utilizing: A High-Quality Pipeline for Small Object Detection [83.90563802153707]
PLUSNet is a high-quality Small object detection framework. It comprises three components: the Hierarchical Feature (HFP) framework for purifying upstream features, the Multiple Criteria Label Assignment (MCLA) for improving the quality of midstream training samples, and the Frequency Decoupled Head (FDHead) for more effectively exploiting information to accomplish downstream tasks.
arXiv Detail & Related papers (2025-04-29T10:11:03Z)
Data Requirement Goal Modeling for Machine Learning Systems [0.8854624631197942]
This work proposes an approach to guide non-experts in identifying data requirements for Machine Learning systems. We first develop the Data Requirement Goal Model (DRGM) by surveying the white literature. We then validate the approach through two illustrative examples based on real-world projects.
arXiv Detail & Related papers (2025-04-10T11:30:25Z)
IMPROVE: Iterative Model Pipeline Refinement and Optimization Leveraging LLM Agents [17.301758094000125]
Large language model (LLM) agents have emerged as a promising solution to automate the development of computer vision models. We introduce Iterative Refinement, a novel strategy for LLM-driven ML pipeline design. Iterative Refinement improves stability, interpretability, and overall model performance.
arXiv Detail & Related papers (2025-02-25T01:52:37Z)
AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML [56.565200973244146]
Automated machine learning (AutoML) accelerates AI development by automating tasks in the development pipeline. Recent works have started exploiting large language models (LLM) to lessen such burden. This paper proposes AutoML-Agent, a novel multi-agent framework tailored for full-pipeline AutoML.
arXiv Detail & Related papers (2024-10-03T20:01:09Z)
SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction. SMILE allows for the upscaling of source models into an MoE model without extra data or further training. We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z)
EDALearn: A Comprehensive RTL-to-Signoff EDA Benchmark for Democratized and Reproducible ML for EDA Research [5.093676641214663]
We introduce EDALearn, the first holistic, open-source benchmark suite specifically for Machine Learning tasks in EDA. This benchmark suite presents an end-to-end flow from synthesis to physical implementation, enriching data collection across various stages. Our contributions aim to encourage further advances in the ML-EDA domain.
arXiv Detail & Related papers (2023-12-04T06:51:46Z)
Benchmarking Automated Machine Learning Methods for Price Forecasting Applications [58.720142291102135]
We show the possibility of substituting manually created ML pipelines with automated machine learning (AutoML) solutions. Based on the CRISP-DM process, we split the manual ML pipeline into a machine learning and non-machine learning part. We show in a case study for the industrial use case of price forecasting, that domain knowledge combined with AutoML can weaken the dependence on ML experts.
arXiv Detail & Related papers (2023-04-28T10:27:38Z)
OmniForce: On Human-Centered, Large Model Empowered and Cloud-Edge Collaborative AutoML System [85.8338446357469]
We introduce OmniForce, a human-centered AutoML system that yields both human-assisted ML and ML-assisted human techniques. We show how OmniForce can put an AutoML system into practice and build adaptive AI in open-environment scenarios.
arXiv Detail & Related papers (2023-03-01T13:35:22Z)
DiffML: End-to-end Differentiable ML Pipelines [12.869023436690894]
DiffML allows to jointly train not just the ML model itself but also the entire pipeline. Our core idea is to formulate all pipeline steps in a differentiable way. This is a non-trivial problem and opens up many new research questions.
arXiv Detail & Related papers (2022-07-04T09:03:02Z)
Data Debugging with Shapley Importance over End-to-End Machine Learning Pipelines [27.461398584509755]
DataScope is the first system that efficiently computes Shapley values of training examples over an end-to-end machine learning pipeline. Our results show that DataScope is up to four orders of magnitude faster than state-of-the-art Monte Carlo-based methods.
arXiv Detail & Related papers (2022-04-23T19:29:23Z)
SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines. This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z)
Production Machine Learning Pipelines: Empirical Analysis and Optimization Opportunities [5.510431861706128]
We analyze provenance graphs of 3000 production ML pipelines at Google, comprising over 450,000 models trained, spanning a period of over four months. Our analysis reveals the characteristics, components, and topologies of typical industry-strength ML pipelines at various granularities. We identify several rich opportunities for optimization, leveraging traditional data management ideas.
arXiv Detail & Related papers (2021-03-30T00:46:29Z)
A Rigorous Machine Learning Analysis Pipeline for Biomedical Binary Classification: Application in Pancreatic Cancer Nested Case-control Studies with Implications for Bias Assessments [2.9726886415710276]
We have laid out and assembled a complete, rigorous ML analysis pipeline focused on binary classification. This 'automated' but customizable pipeline includes a) exploratory analysis, b) data cleaning and transformation, c) feature selection, d) model training with 9 established ML algorithms. We apply this pipeline to an epidemiological investigation of established and newly identified risk factors for cancer to evaluate how different sources of bias might be handled by ML algorithms.
arXiv Detail & Related papers (2020-08-28T19:58:05Z)
A Survey on Large-scale Machine Learning [67.6997613600942]
Machine learning can provide deep insights into data, allowing machines to make high-quality predictions. Most sophisticated machine learning approaches suffer from huge time costs when operating on large-scale data. Large-scale Machine Learning aims to learn patterns from big data with comparable performance efficiently.
arXiv Detail & Related papers (2020-08-10T06:07:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.