eTOP: Early Termination of Pipelines for Faster Training of AutoML
Systems
- URL: http://arxiv.org/abs/2304.08597v1
- Date: Mon, 17 Apr 2023 20:22:30 GMT
- Title: eTOP: Early Termination of Pipelines for Faster Training of AutoML
Systems
- Authors: Haoxiang Zhang, Juliana Freire, Yash Garg
- Abstract summary: Finding the right AI/ML model is a complex and costly process.
We propose eTOP Framework which works on top of any AutoML system.
- Score: 12.933957727351666
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advancements in software and hardware technologies have enabled the
use of AI/ML models in everyday applications has significantly improved the
quality of service rendered. However, for a given application, finding the
right AI/ML model is a complex and costly process, that involves the
generation, training, and evaluation of multiple interlinked steps (called
pipelines), such as data pre-processing, feature engineering, selection, and
model tuning. These pipelines are complex (in structure) and costly (both in
compute resource and time) to execute end-to-end, with a hyper-parameter
associated with each step. AutoML systems automate the search of these
hyper-parameters but are slow, as they rely on optimizing the pipeline's end
output. We propose the eTOP Framework which works on top of any AutoML system
and decides whether or not to execute the pipeline to the end or terminate at
an intermediate step. Experimental evaluation on 26 benchmark datasets and
integration of eTOPwith MLBox4 reduces the training time of the AutoML system
upto 40x than baseline MLBox.
Related papers
- LLM-AutoDiff: Auto-Differentiate Any LLM Workflow [58.56731133392544]
We introduce LLM-AutoDiff: a novel framework for Automatic Prompt Engineering (APE)
LLMs-AutoDiff treats each textual input as a trainable parameter and uses a frozen backward engine to generate feedback-akin to textual gradients.
It consistently outperforms existing textual gradient baselines in both accuracy and training cost.
arXiv Detail & Related papers (2025-01-28T03:18:48Z) - AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML [56.565200973244146]
Automated machine learning (AutoML) accelerates AI development by automating tasks in the development pipeline.
Recent works have started exploiting large language models (LLM) to lessen such burden.
This paper proposes AutoML-Agent, a novel multi-agent framework tailored for full-pipeline AutoML.
arXiv Detail & Related papers (2024-10-03T20:01:09Z) - ToolACE: Winning the Points of LLM Function Calling [139.07157814653638]
ToolACE is an automatic agentic pipeline designed to generate accurate, complex, and diverse tool-learning data.
We demonstrate that models trained on our synthesized data, even with only 8B parameters, achieve state-of-the-art performance on the Berkeley Function-Calling Leaderboard.
arXiv Detail & Related papers (2024-09-02T03:19:56Z) - AutoMMLab: Automatically Generating Deployable Models from Language Instructions for Computer Vision Tasks [37.48197934228379]
There is no AutoML system that automates the entire end-to-end model production workflow for computer vision.
We propose a novel request-to-model task, which involves understanding the user's natural language request and executing the entire workflow to output production-ready models.
This empowers non-expert individuals to easily build task-specific models via a user-friendly language interface.
arXiv Detail & Related papers (2024-02-23T14:38:19Z) - In Situ Framework for Coupling Simulation and Machine Learning with
Application to CFD [51.04126395480625]
Recent years have seen many successful applications of machine learning (ML) to facilitate fluid dynamic computations.
As simulations grow, generating new training datasets for traditional offline learning creates I/O and storage bottlenecks.
This work offers a solution by simplifying this coupling and enabling in situ training and inference on heterogeneous clusters.
arXiv Detail & Related papers (2023-06-22T14:07:54Z) - AutoEn: An AutoML method based on ensembles of predefined Machine
Learning pipelines for supervised Traffic Forecasting [1.6242924916178283]
Traffic Forecasting (TF) is gaining relevance due to its ability to mitigate traffic congestion by forecasting future traffic states.
TF poses one big challenge to the Machine Learning paradigm, known as the Model Selection Problem (MSP)
We introduce AutoEn, which is a simple and efficient method for automatically generating multi-classifier ensembles from a predefined set of ML pipelines.
arXiv Detail & Related papers (2023-03-19T18:37:18Z) - OmniForce: On Human-Centered, Large Model Empowered and Cloud-Edge
Collaborative AutoML System [85.8338446357469]
We introduce OmniForce, a human-centered AutoML system that yields both human-assisted ML and ML-assisted human techniques.
We show how OmniForce can put an AutoML system into practice and build adaptive AI in open-environment scenarios.
arXiv Detail & Related papers (2023-03-01T13:35:22Z) - SubStrat: A Subset-Based Strategy for Faster AutoML [5.833272638548153]
SubStrat is an AutoML optimization strategy that tackles the data size, rather than configuration space.
It wraps existing AutoML tools, and instead of executing them directly on the entire dataset, SubStrat uses a genetic-based algorithm to find a small subset.
It then employs the AutoML tool on the small subset, and finally, it refines the resulted pipeline by executing a restricted, much shorter, AutoML process on the large dataset.
arXiv Detail & Related papers (2022-06-07T07:44:06Z) - SapientML: Synthesizing Machine Learning Pipelines by Learning from
Human-Written Solutions [28.718446733713183]
We propose an AutoML SapientML that can learn from a corpus of existing datasets and their human-written pipelines.
We have created a training corpus of 1094 pipelines spanning 170 datasets, and evaluated SapientML on a set of 41 benchmark datasets.
Our evaluation shows that SapientML produces the best or comparable accuracy on 27 of the benchmarks while the second best tool fails to even produce a pipeline on 9 of the instances.
arXiv Detail & Related papers (2022-02-18T20:45:47Z) - SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines.
This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.