Related papers: AVATAR -- Machine Learning Pipeline Evaluation Using Surrogate Model

AVATAR -- Machine Learning Pipeline Evaluation Using Surrogate Model

URL: http://arxiv.org/abs/2001.11158v2
Date: Mon, 3 Feb 2020 01:00:59 GMT
Title: AVATAR -- Machine Learning Pipeline Evaluation Using Surrogate Model
Authors: Tien-Dung Nguyen, Tomasz Maszczyk, Katarzyna Musial, Marc-Andre Z\"oller, Bogdan Gabrys
Abstract summary: We propose a novel method to evaluate the validity of ML pipelines using a surrogate model (AVATAR) Our experiments show that the AVATAR is more efficient in evaluating complex pipelines in comparison with the traditional evaluation approaches requiring their execution.
Score: 10.83607599315401
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The evaluation of machine learning (ML) pipelines is essential during automatic ML pipeline composition and optimisation. The previous methods such as Bayesian-based and genetic-based optimisation, which are implemented in Auto-Weka, Auto-sklearn and TPOT, evaluate pipelines by executing them. Therefore, the pipeline composition and optimisation of these methods requires a tremendous amount of time that prevents them from exploring complex pipelines to find better predictive models. To further explore this research challenge, we have conducted experiments showing that many of the generated pipelines are invalid, and it is unnecessary to execute them to find out whether they are good pipelines. To address this issue, we propose a novel method to evaluate the validity of ML pipelines using a surrogate model (AVATAR). The AVATAR enables to accelerate automatic ML pipeline composition and optimisation by quickly ignoring invalid pipelines. Our experiments show that the AVATAR is more efficient in evaluating complex pipelines in comparison with the traditional evaluation approaches requiring their execution.

Related papers

Purifying, Labeling, and Utilizing: A High-Quality Pipeline for Small Object Detection [83.90563802153707]
PLUSNet is a high-quality Small object detection framework. It comprises three components: the Hierarchical Feature (HFP) framework for purifying upstream features, the Multiple Criteria Label Assignment (MCLA) for improving the quality of midstream training samples, and the Frequency Decoupled Head (FDHead) for more effectively exploiting information to accomplish downstream tasks.
arXiv Detail & Related papers (2025-04-29T10:11:03Z)
FlowTS: Time Series Generation via Rectified Flow [67.41208519939626]
FlowTS is an ODE-based model that leverages rectified flow with straight-line transport in probability space. For unconditional setting, FlowTS achieves state-of-the-art performance, with context FID scores of 0.019 and 0.011 on Stock and ETTh datasets. For conditional setting, we have achieved superior performance in solar forecasting.
arXiv Detail & Related papers (2024-11-12T03:03:23Z)
Efficient Learning of POMDPs with Known Observation Model in Average-Reward Setting [56.92178753201331]
We propose the Observation-Aware Spectral (OAS) estimation technique, which enables the POMDP parameters to be learned from samples collected using a belief-based policy. We show the consistency of the OAS procedure, and we prove a regret guarantee of order $mathcalO(sqrtT log(T)$ for the proposed OAS-UCRL algorithm.
arXiv Detail & Related papers (2024-10-02T08:46:34Z)
Instrumentation and Analysis of Native ML Pipelines via Logical Query Plans [3.2362171533623054]
We envision highly-automated software platforms to assist data scientists with developing, validating, monitoring, and analysing their Machine Learning pipelines. We extract "logical query plans" from ML pipeline code relying on popular libraries. Based on these plans, we automatically infer pipeline semantics and instrument and rewrite the ML pipelines to enable diverse use cases without requiring data scientists to manually annotate or rewrite their code.
arXiv Detail & Related papers (2024-07-10T11:35:02Z)
NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking [65.24988062003096]
We present NAVSIM, a framework for benchmarking vision-based driving policies. Our simulation is non-reactive, i.e., the evaluated policy and environment do not influence each other. NAVSIM enabled a new competition held at CVPR 2024, where 143 teams submitted 463 entries, resulting in several new insights.
arXiv Detail & Related papers (2024-06-21T17:59:02Z)
Towards Interactively Improving ML Data Preparation Code via "Shadow Pipelines" [13.559945160284876]
Data scientists develop ML pipelines in an iterative manner: they repeatedly screen a pipeline for potential issues, debug it, and then revise and improve its code according to their findings. We propose to support data scientists during this development cycle with automatically derived interactive suggestions for pipeline improvements. We discuss our vision to generate these suggestions with so-called shadow pipelines, hidden variants of the original pipeline that modify it to auto-detect potential issues, try out modifications for improvements, and suggest and explain these modifications to the user. We envision to apply incremental view maintenance-based optimisations to ensure low-latency computation and maintenance of the shadow pipelines
arXiv Detail & Related papers (2024-04-30T14:36:04Z)
Deep Pipeline Embeddings for AutoML [11.168121941015015]
AutoML is a promising direction for democratizing AI by automatically deploying Machine Learning systems with minimal human expertise. Existing Pipeline Optimization techniques fail to explore deep interactions between pipeline stages/components. This paper proposes a novel neural architecture that captures the deep interaction between the components of a Machine Learning pipeline.
arXiv Detail & Related papers (2023-05-23T12:40:38Z)
Where Is My Training Bottleneck? Hidden Trade-Offs in Deep Learning Preprocessing Pipelines [77.45213180689952]
Preprocessing pipelines in deep learning aim to provide sufficient data throughput to keep the training processes busy. We introduce a new perspective on efficiently preparing datasets for end-to-end deep learning pipelines. We obtain an increased throughput of 3x to 13x compared to an untuned system.
arXiv Detail & Related papers (2022-02-17T14:31:58Z)
Automated Evolutionary Approach for the Design of Composite Machine Learning Pipelines [48.7576911714538]
The proposed approach is aimed to automate the design of composite machine learning pipelines. It designs the pipelines with a customizable graph-based structure, analyzes the obtained results, and reproduces them. The software implementation on this approach is presented as an open-source framework.
arXiv Detail & Related papers (2021-06-26T23:19:06Z)
PipeTransformer: Automated Elastic Pipelining for Distributed Training of Transformers [47.194426122333205]
PipeTransformer is a distributed training algorithm for Transformer models. It automatically adjusts the pipelining and data parallelism by identifying and freezing some layers during the training. We evaluate PipeTransformer using Vision Transformer (ViT) on ImageNet and BERT on GLUE and SQuAD datasets.
arXiv Detail & Related papers (2021-02-05T13:39:31Z)
Incremental Search Space Construction for Machine Learning Pipeline Synthesis [4.060731229044571]
Automated machine learning (AutoML) aims for constructing machine learning (ML) pipelines automatically. We propose a data-centric approach based on meta-features for pipeline construction. We prove the effectiveness and competitiveness of our approach on 28 data sets used in well-established AutoML benchmarks.
arXiv Detail & Related papers (2021-01-26T17:17:49Z)
AutoWeka4MCPS-AVATAR: Accelerating Automated Machine Learning Pipeline Composition and Optimisation [13.116806430326513]
We propose a novel method to evaluate the validity of ML pipelines, without their execution, using a surrogate model (AVATAR) The AVATAR generates a knowledge base by automatically learning the capabilities and effects of ML algorithms on datasets' characteristics. Instead of executing the original ML pipeline to evaluate its validity, the AVATAR evaluates its surrogate model constructed by capabilities and effects of the ML pipeline components.
arXiv Detail & Related papers (2020-11-21T14:05:49Z)
Deep Shells: Unsupervised Shape Correspondence with Optimal Transport [52.646396621449]
We propose a novel unsupervised learning approach to 3D shape correspondence. We show that the proposed method significantly improves over the state-of-the-art on multiple datasets.
arXiv Detail & Related papers (2020-10-28T22:24:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.