Evolution of Scikit-Learn Pipelines with Dynamic Structured Grammatical
Evolution
- URL: http://arxiv.org/abs/2004.00307v1
- Date: Wed, 1 Apr 2020 09:31:34 GMT
- Title: Evolution of Scikit-Learn Pipelines with Dynamic Structured Grammatical
Evolution
- Authors: Filipe Assun\c{c}\~ao, Nuno Louren\c{c}o, Bernardete Ribeiro, and
Penousal Machado
- Abstract summary: This paper describes a novel grammar-based framework that adapts Dynamic Structured Grammatical Evolution (DSGE) to the evolution of Scikit-Learn classification pipelines.
The experimental results include comparing AutoML-DSGE to another grammar-based AutoML framework, Resilient ClassificationPipeline Evolution (RECIPE)
- Score: 1.5224436211478214
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The deployment of Machine Learning (ML) models is a difficult and
time-consuming job that comprises a series of sequential and correlated tasks
that go from the data pre-processing, and the design and extraction of
features, to the choice of the ML algorithm and its parameterisation. The task
is even more challenging considering that the design of features is in many
cases problem specific, and thus requires domain-expertise. To overcome these
limitations Automated Machine Learning (AutoML) methods seek to automate, with
few or no human-intervention, the design of pipelines, i.e., automate the
selection of the sequence of methods that have to be applied to the raw data.
These methods have the potential to enable non-expert users to use ML, and
provide expert users with solutions that they would unlikely consider. In
particular, this paper describes AutoML-DSGE - a novel grammar-based framework
that adapts Dynamic Structured Grammatical Evolution (DSGE) to the evolution of
Scikit-Learn classification pipelines. The experimental results include
comparing AutoML-DSGE to another grammar-based AutoML framework, Resilient
ClassificationPipeline Evolution (RECIPE), and show that the average
performance of the classification pipelines generated by AutoML-DSGE is always
superior to the average performance of RECIPE; the differences are
statistically significant in 3 out of the 10 used datasets.
Related papers
- AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML [56.565200973244146]
Automated machine learning (AutoML) accelerates AI development by automating tasks in the development pipeline.
Recent works have started exploiting large language models (LLM) to lessen such burden.
This paper proposes AutoML-Agent, a novel multi-agent framework tailored for full-pipeline AutoML.
arXiv Detail & Related papers (2024-10-03T20:01:09Z) - Verbalized Machine Learning: Revisiting Machine Learning with Language Models [63.10391314749408]
We introduce the framework of verbalized machine learning (VML)
VML constrains the parameter space to be human-interpretable natural language.
We empirically verify the effectiveness of VML, and hope that VML can serve as a stepping stone to stronger interpretability.
arXiv Detail & Related papers (2024-06-06T17:59:56Z) - From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning [52.257422715393574]
We introduce a self-guided methodology for Large Language Models (LLMs) to autonomously discern and select cherry samples from open-source datasets.
Our key innovation, the Instruction-Following Difficulty (IFD) metric, emerges as a pivotal metric to identify discrepancies between a model's expected responses and its intrinsic generation capability.
arXiv Detail & Related papers (2023-08-23T09:45:29Z) - The Devil is in the Errors: Leveraging Large Language Models for
Fine-grained Machine Translation Evaluation [93.01964988474755]
AutoMQM is a prompting technique which asks large language models to identify and categorize errors in translations.
We study the impact of labeled data through in-context learning and finetuning.
We then evaluate AutoMQM with PaLM-2 models, and we find that it improves performance compared to just prompting for scores.
arXiv Detail & Related papers (2023-08-14T17:17:21Z) - AutoEn: An AutoML method based on ensembles of predefined Machine
Learning pipelines for supervised Traffic Forecasting [1.6242924916178283]
Traffic Forecasting (TF) is gaining relevance due to its ability to mitigate traffic congestion by forecasting future traffic states.
TF poses one big challenge to the Machine Learning paradigm, known as the Model Selection Problem (MSP)
We introduce AutoEn, which is a simple and efficient method for automatically generating multi-classifier ensembles from a predefined set of ML pipelines.
arXiv Detail & Related papers (2023-03-19T18:37:18Z) - STREAMLINE: A Simple, Transparent, End-To-End Automated Machine Learning
Pipeline Facilitating Data Analysis and Algorithm Comparison [0.49034553215430216]
STREAMLINE is a simple, transparent, end-to-end AutoML pipeline.
It is specifically designed to compare performance between datasets, ML algorithms, and other AutoML tools.
arXiv Detail & Related papers (2022-06-23T22:40:58Z) - Automatic Componentwise Boosting: An Interpretable AutoML System [1.1709030738577393]
We propose an AutoML system that constructs an interpretable additive model that can be fitted using a highly scalable componentwise boosting algorithm.
Our system provides tools for easy model interpretation such as visualizing partial effects and pairwise interactions.
Despite its restriction to an interpretable model space, our system is competitive in terms of predictive performance on most data sets.
arXiv Detail & Related papers (2021-09-12T18:34:33Z) - Man versus Machine: AutoML and Human Experts' Role in Phishing Detection [4.124446337711138]
This paper compares the performances of six well-known, state-of-the-art AutoML frameworks on ten different phishing datasets.
Our results indicate that AutoML-based models are able to outperform manually developed machine learning models in complex classification tasks.
arXiv Detail & Related papers (2021-08-27T09:26:20Z) - VolcanoML: Speeding up End-to-End AutoML via Scalable Search Space
Decomposition [57.06900573003609]
VolcanoML is a framework that decomposes a large AutoML search space into smaller ones.
It supports a Volcano-style execution model, akin to the one supported by modern database systems.
Our evaluation demonstrates that, not only does VolcanoML raise the level of expressiveness for search space decomposition in AutoML, it also leads to actual findings of decomposition strategies.
arXiv Detail & Related papers (2021-07-19T13:23:57Z) - Adaptation Strategies for Automated Machine Learning on Evolving Data [7.843067454030999]
This study is to understand the effect of data stream challenges such as concept drift on the performance of AutoML methods.
We propose 6 concept drift adaptation strategies and evaluate their effectiveness on different AutoML approaches.
arXiv Detail & Related papers (2020-06-09T14:29:16Z) - Multi-layer Optimizations for End-to-End Data Analytics [71.05611866288196]
We introduce Iterative Functional Aggregate Queries (IFAQ), a framework that realizes an alternative approach.
IFAQ treats the feature extraction query and the learning task as one program given in the IFAQ's domain-specific language.
We show that a Scala implementation of IFAQ can outperform mlpack, Scikit, and specialization by several orders of magnitude for linear regression and regression tree models over several relational datasets.
arXiv Detail & Related papers (2020-01-10T16:14:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.