eTOP: Early Termination of Pipelines for Faster Training of AutoML
Systems
- URL: http://arxiv.org/abs/2304.08597v1
- Date: Mon, 17 Apr 2023 20:22:30 GMT
- Title: eTOP: Early Termination of Pipelines for Faster Training of AutoML
Systems
- Authors: Haoxiang Zhang, Juliana Freire, Yash Garg
- Abstract summary: Finding the right AI/ML model is a complex and costly process.
We propose eTOP Framework which works on top of any AutoML system.
- Score: 12.933957727351666
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advancements in software and hardware technologies have enabled the
use of AI/ML models in everyday applications has significantly improved the
quality of service rendered. However, for a given application, finding the
right AI/ML model is a complex and costly process, that involves the
generation, training, and evaluation of multiple interlinked steps (called
pipelines), such as data pre-processing, feature engineering, selection, and
model tuning. These pipelines are complex (in structure) and costly (both in
compute resource and time) to execute end-to-end, with a hyper-parameter
associated with each step. AutoML systems automate the search of these
hyper-parameters but are slow, as they rely on optimizing the pipeline's end
output. We propose the eTOP Framework which works on top of any AutoML system
and decides whether or not to execute the pipeline to the end or terminate at
an intermediate step. Experimental evaluation on 26 benchmark datasets and
integration of eTOPwith MLBox4 reduces the training time of the AutoML system
upto 40x than baseline MLBox.
Related papers
- AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML [56.565200973244146]
Automated machine learning (AutoML) accelerates AI development by automating tasks in the development pipeline.
Recent works have started exploiting large language models (LLM) to lessen such burden.
This paper proposes AutoML-Agent, a novel multi-agent framework tailored for full-pipeline AutoML.
arXiv Detail & Related papers (2024-10-03T20:01:09Z) - ToolACE: Winning the Points of LLM Function Calling [139.07157814653638]
ToolACE is an automatic agentic pipeline designed to generate accurate, complex, and diverse tool-learning data.
We demonstrate that models trained on our synthesized data, even with only 8B parameters, achieve state-of-the-art performance on the Berkeley Function-Calling Leaderboard.
arXiv Detail & Related papers (2024-09-02T03:19:56Z) - In Situ Framework for Coupling Simulation and Machine Learning with
Application to CFD [51.04126395480625]
Recent years have seen many successful applications of machine learning (ML) to facilitate fluid dynamic computations.
As simulations grow, generating new training datasets for traditional offline learning creates I/O and storage bottlenecks.
This work offers a solution by simplifying this coupling and enabling in situ training and inference on heterogeneous clusters.
arXiv Detail & Related papers (2023-06-22T14:07:54Z) - AutoEn: An AutoML method based on ensembles of predefined Machine
Learning pipelines for supervised Traffic Forecasting [1.6242924916178283]
Traffic Forecasting (TF) is gaining relevance due to its ability to mitigate traffic congestion by forecasting future traffic states.
TF poses one big challenge to the Machine Learning paradigm, known as the Model Selection Problem (MSP)
We introduce AutoEn, which is a simple and efficient method for automatically generating multi-classifier ensembles from a predefined set of ML pipelines.
arXiv Detail & Related papers (2023-03-19T18:37:18Z) - OmniForce: On Human-Centered, Large Model Empowered and Cloud-Edge
Collaborative AutoML System [85.8338446357469]
We introduce OmniForce, a human-centered AutoML system that yields both human-assisted ML and ML-assisted human techniques.
We show how OmniForce can put an AutoML system into practice and build adaptive AI in open-environment scenarios.
arXiv Detail & Related papers (2023-03-01T13:35:22Z) - SubStrat: A Subset-Based Strategy for Faster AutoML [5.833272638548153]
SubStrat is an AutoML optimization strategy that tackles the data size, rather than configuration space.
It wraps existing AutoML tools, and instead of executing them directly on the entire dataset, SubStrat uses a genetic-based algorithm to find a small subset.
It then employs the AutoML tool on the small subset, and finally, it refines the resulted pipeline by executing a restricted, much shorter, AutoML process on the large dataset.
arXiv Detail & Related papers (2022-06-07T07:44:06Z) - SapientML: Synthesizing Machine Learning Pipelines by Learning from
Human-Written Solutions [28.718446733713183]
We propose an AutoML SapientML that can learn from a corpus of existing datasets and their human-written pipelines.
We have created a training corpus of 1094 pipelines spanning 170 datasets, and evaluated SapientML on a set of 41 benchmark datasets.
Our evaluation shows that SapientML produces the best or comparable accuracy on 27 of the benchmarks while the second best tool fails to even produce a pipeline on 9 of the instances.
arXiv Detail & Related papers (2022-02-18T20:45:47Z) - SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines.
This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z) - VolcanoML: Speeding up End-to-End AutoML via Scalable Search Space
Decomposition [57.06900573003609]
VolcanoML is a framework that decomposes a large AutoML search space into smaller ones.
It supports a Volcano-style execution model, akin to the one supported by modern database systems.
Our evaluation demonstrates that, not only does VolcanoML raise the level of expressiveness for search space decomposition in AutoML, it also leads to actual findings of decomposition strategies.
arXiv Detail & Related papers (2021-07-19T13:23:57Z) - AutoWeka4MCPS-AVATAR: Accelerating Automated Machine Learning Pipeline
Composition and Optimisation [13.116806430326513]
We propose a novel method to evaluate the validity of ML pipelines, without their execution, using a surrogate model (AVATAR)
The AVATAR generates a knowledge base by automatically learning the capabilities and effects of ML algorithms on datasets' characteristics.
Instead of executing the original ML pipeline to evaluate its validity, the AVATAR evaluates its surrogate model constructed by capabilities and effects of the ML pipeline components.
arXiv Detail & Related papers (2020-11-21T14:05:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.