Amazon SageMaker Autopilot: a white box AutoML solution at scale
- URL: http://arxiv.org/abs/2012.08483v2
- Date: Wed, 16 Dec 2020 18:51:27 GMT
- Title: Amazon SageMaker Autopilot: a white box AutoML solution at scale
- Authors: Piali Das, Valerio Perrone, Nikita Ivkin, Tanya Bansal, Zohar Karnin,
Huibin Shen, Iaroslav Shcherbatyi, Yotam Elor, Wilton Wu, Aida Zolic, Thibaut
Lienart, Alex Tang, Amr Ahmed, Jean Baptiste Faddoul, Rodolphe Jenatton, Fela
Winkelmolen, Philip Gautier, Leo Dirac, Andre Perunicic, Miroslav
Miladinovic, Giovanni Zappella, C\'edric Archambeau, Matthias Seeger, Bhaskar
Dutt, Laurence Rouesnel
- Abstract summary: We present Amazon SageMaker Autopilot, a fully managed system providing an automated machine learning solution.
Autopilot identifies the problem type, analyzes the data and produces a diverse set of complete ML pipelines.
In the scenario where the performance is not satisfactory, a data scientist is able to view and edit the proposed ML pipelines.
- Score: 16.607423930188734
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: AutoML systems provide a black-box solution to machine learning problems by
selecting the right way of processing features, choosing an algorithm and
tuning the hyperparameters of the entire pipeline. Although these systems
perform well on many datasets, there is still a non-negligible number of
datasets for which the one-shot solution produced by each particular system
would provide sub-par performance. In this paper, we present Amazon SageMaker
Autopilot: a fully managed system providing an automated ML solution that can
be modified when needed. Given a tabular dataset and the target column name,
Autopilot identifies the problem type, analyzes the data and produces a diverse
set of complete ML pipelines including feature preprocessing and ML algorithms,
which are tuned to generate a leaderboard of candidate models. In the scenario
where the performance is not satisfactory, a data scientist is able to view and
edit the proposed ML pipelines in order to infuse their expertise and business
knowledge without having to revert to a fully manual solution. This paper
describes the different components of Autopilot, emphasizing the infrastructure
choices that allow scalability, high quality models, editable ML pipelines,
consumption of artifacts of offline meta-learning, and a convenient integration
with the entire SageMaker suite allowing these trained models to be used in a
production setting.
Related papers
- LLM-AutoDiff: Auto-Differentiate Any LLM Workflow [58.56731133392544]
We introduce LLM-AutoDiff: a novel framework for Automatic Prompt Engineering (APE)
LLMs-AutoDiff treats each textual input as a trainable parameter and uses a frozen backward engine to generate feedback-akin to textual gradients.
It consistently outperforms existing textual gradient baselines in both accuracy and training cost.
arXiv Detail & Related papers (2025-01-28T03:18:48Z) - AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML [56.565200973244146]
Automated machine learning (AutoML) accelerates AI development by automating tasks in the development pipeline.
Recent works have started exploiting large language models (LLM) to lessen such burden.
This paper proposes AutoML-Agent, a novel multi-agent framework tailored for full-pipeline AutoML.
arXiv Detail & Related papers (2024-10-03T20:01:09Z) - Automated Machine Learning in Insurance [0.0]
This paper introduces an AutoML workflow that allows users without domain knowledge or prior experience to achieve robust and effortless ML deployment by writing only a few lines of code.
This proposed AutoML is specifically tailored for the insurance application, with features like the balancing step in data preprocessing, ensemble pipelines, and customized loss functions.
arXiv Detail & Related papers (2024-08-26T14:55:40Z) - AutoAct: Automatic Agent Learning from Scratch for QA via Self-Planning [54.47116888545878]
AutoAct is an automatic agent learning framework for QA.
It does not rely on large-scale annotated data and synthetic planning trajectories from closed-source models.
arXiv Detail & Related papers (2024-01-10T16:57:24Z) - From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning [52.257422715393574]
We introduce a self-guided methodology for Large Language Models (LLMs) to autonomously discern and select cherry samples from open-source datasets.
Our key innovation, the Instruction-Following Difficulty (IFD) metric, emerges as a pivotal metric to identify discrepancies between a model's expected responses and its intrinsic generation capability.
arXiv Detail & Related papers (2023-08-23T09:45:29Z) - Large Language Models for Automated Data Science: Introducing CAAFE for
Context-Aware Automated Feature Engineering [52.09178018466104]
We introduce Context-Aware Automated Feature Engineering (CAAFE) to generate semantically meaningful features for datasets.
Despite being methodologically simple, CAAFE improves performance on 11 out of 14 datasets.
We highlight the significance of context-aware solutions that can extend the scope of AutoML systems to semantic AutoML.
arXiv Detail & Related papers (2023-05-05T09:58:40Z) - AutoEn: An AutoML method based on ensembles of predefined Machine
Learning pipelines for supervised Traffic Forecasting [1.6242924916178283]
Traffic Forecasting (TF) is gaining relevance due to its ability to mitigate traffic congestion by forecasting future traffic states.
TF poses one big challenge to the Machine Learning paradigm, known as the Model Selection Problem (MSP)
We introduce AutoEn, which is a simple and efficient method for automatically generating multi-classifier ensembles from a predefined set of ML pipelines.
arXiv Detail & Related papers (2023-03-19T18:37:18Z) - OmniForce: On Human-Centered, Large Model Empowered and Cloud-Edge
Collaborative AutoML System [85.8338446357469]
We introduce OmniForce, a human-centered AutoML system that yields both human-assisted ML and ML-assisted human techniques.
We show how OmniForce can put an AutoML system into practice and build adaptive AI in open-environment scenarios.
arXiv Detail & Related papers (2023-03-01T13:35:22Z) - Automatic Componentwise Boosting: An Interpretable AutoML System [1.1709030738577393]
We propose an AutoML system that constructs an interpretable additive model that can be fitted using a highly scalable componentwise boosting algorithm.
Our system provides tools for easy model interpretation such as visualizing partial effects and pairwise interactions.
Despite its restriction to an interpretable model space, our system is competitive in terms of predictive performance on most data sets.
arXiv Detail & Related papers (2021-09-12T18:34:33Z) - VolcanoML: Speeding up End-to-End AutoML via Scalable Search Space
Decomposition [57.06900573003609]
VolcanoML is a framework that decomposes a large AutoML search space into smaller ones.
It supports a Volcano-style execution model, akin to the one supported by modern database systems.
Our evaluation demonstrates that, not only does VolcanoML raise the level of expressiveness for search space decomposition in AutoML, it also leads to actual findings of decomposition strategies.
arXiv Detail & Related papers (2021-07-19T13:23:57Z) - Interpret-able feedback for AutoML systems [5.5524559605452595]
Automated machine learning (AutoML) systems aim to enable training machine learning (ML) models for non-ML experts.
A shortcoming of these systems is that when they fail to produce a model with high accuracy, the user has no path to improve the model.
We introduce an interpretable data feedback solution for AutoML.
arXiv Detail & Related papers (2021-02-22T18:54:26Z) - Evolution of Scikit-Learn Pipelines with Dynamic Structured Grammatical
Evolution [1.5224436211478214]
This paper describes a novel grammar-based framework that adapts Dynamic Structured Grammatical Evolution (DSGE) to the evolution of Scikit-Learn classification pipelines.
The experimental results include comparing AutoML-DSGE to another grammar-based AutoML framework, Resilient ClassificationPipeline Evolution (RECIPE)
arXiv Detail & Related papers (2020-04-01T09:31:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.