Related papers: Incremental Search Space Construction for Machine Learning Pipeline Synthesis

Incremental Search Space Construction for Machine Learning Pipeline Synthesis

URL: http://arxiv.org/abs/2101.10951v1
Date: Tue, 26 Jan 2021 17:17:49 GMT
Title: Incremental Search Space Construction for Machine Learning Pipeline Synthesis
Authors: Marc-Andr\'e Z\"oller, Tien-Dung Nguyen, Marco F. Huber
Abstract summary: Automated machine learning (AutoML) aims for constructing machine learning (ML) pipelines automatically. We propose a data-centric approach based on meta-features for pipeline construction. We prove the effectiveness and competitiveness of our approach on 28 data sets used in well-established AutoML benchmarks.
Score: 4.060731229044571
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Automated machine learning (AutoML) aims for constructing machine learning (ML) pipelines automatically. Many studies have investigated efficient methods for algorithm selection and hyperparameter optimization. However, methods for ML pipeline synthesis and optimization considering the impact of complex pipeline structures containing multiple preprocessing and classification algorithms have not been studied thoroughly. In this paper, we propose a data-centric approach based on meta-features for pipeline construction and hyperparameter optimization inspired by human behavior. By expanding the pipeline search space incrementally in combination with meta-features of intermediate data sets, we are able to prune the pipeline structure search space efficiently. Consequently, flexible and data set specific ML pipelines can be constructed. We prove the effectiveness and competitiveness of our approach on 28 data sets used in well-established AutoML benchmarks in comparison with state-of-the-art AutoML frameworks.

Related papers

CoT-Self-Instruct: Building high-quality synthetic prompts for reasoning and non-reasoning tasks [59.69339605157168]
CoT-Self-Instruct is a synthetic data generation method that instructs LLMs to first reason and plan via Chain-of-Thought.<n>In verifiable reasoning, our synthetic data significantly outperforms existing training datasets.<n>For non-verifiable instruction-following tasks, our method surpasses the performance of both human and standard Self-Instruct training data.
arXiv Detail & Related papers (2025-07-31T17:38:50Z)
LLM-based Optimization of Compound AI Systems: A Survey [64.39860384538338]
In a compound AI system, components such as an LLM call, a retriever, a code interpreter, or tools are interconnected. Recent advancements enable end-to-end optimization of these parameters using an LLM. This paper presents a survey of the principles and emerging trends in LLM-based optimization of compound AI systems.
arXiv Detail & Related papers (2024-10-21T18:06:25Z)
AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML [56.565200973244146]
Automated machine learning (AutoML) accelerates AI development by automating tasks in the development pipeline. Recent works have started exploiting large language models (LLM) to lessen such burden. This paper proposes AutoML-Agent, a novel multi-agent framework tailored for full-pipeline AutoML.
arXiv Detail & Related papers (2024-10-03T20:01:09Z)
Deep Pipeline Embeddings for AutoML [11.168121941015015]
AutoML is a promising direction for democratizing AI by automatically deploying Machine Learning systems with minimal human expertise. Existing Pipeline Optimization techniques fail to explore deep interactions between pipeline stages/components. This paper proposes a novel neural architecture that captures the deep interaction between the components of a Machine Learning pipeline.
arXiv Detail & Related papers (2023-05-23T12:40:38Z)
AutoEn: An AutoML method based on ensembles of predefined Machine Learning pipelines for supervised Traffic Forecasting [1.6242924916178283]
Traffic Forecasting (TF) is gaining relevance due to its ability to mitigate traffic congestion by forecasting future traffic states. TF poses one big challenge to the Machine Learning paradigm, known as the Model Selection Problem (MSP) We introduce AutoEn, which is a simple and efficient method for automatically generating multi-classifier ensembles from a predefined set of ML pipelines.
arXiv Detail & Related papers (2023-03-19T18:37:18Z)
Towards Personalized Preprocessing Pipeline Search [52.59156206880384]
ClusterP3S is a novel framework for Personalized Preprocessing Pipeline Search via Clustering. We propose a hierarchical search strategy to jointly learn the clusters and search for the optimal pipelines. Experiments on benchmark classification datasets demonstrate the effectiveness of enabling feature-wise preprocessing pipeline search.
arXiv Detail & Related papers (2023-02-28T05:45:05Z)
Efficient Non-Parametric Optimizer Search for Diverse Tasks [93.64739408827604]
We present the first efficient scalable and general framework that can directly search on the tasks of interest. Inspired by the innate tree structure of the underlying math expressions, we re-arrange the spaces into a super-tree. We adopt an adaptation of the Monte Carlo method to tree search, equipped with rejection sampling and equivalent- form detection.
arXiv Detail & Related papers (2022-09-27T17:51:31Z)
SubStrat: A Subset-Based Strategy for Faster AutoML [5.833272638548153]
SubStrat is an AutoML optimization strategy that tackles the data size, rather than configuration space. It wraps existing AutoML tools, and instead of executing them directly on the entire dataset, SubStrat uses a genetic-based algorithm to find a small subset. It then employs the AutoML tool on the small subset, and finally, it refines the resulted pipeline by executing a restricted, much shorter, AutoML process on the large dataset.
arXiv Detail & Related papers (2022-06-07T07:44:06Z)
SapientML: Synthesizing Machine Learning Pipelines by Learning from Human-Written Solutions [28.718446733713183]
We propose an AutoML SapientML that can learn from a corpus of existing datasets and their human-written pipelines. We have created a training corpus of 1094 pipelines spanning 170 datasets, and evaluated SapientML on a set of 41 benchmark datasets. Our evaluation shows that SapientML produces the best or comparable accuracy on 27 of the benchmarks while the second best tool fails to even produce a pipeline on 9 of the instances.
arXiv Detail & Related papers (2022-02-18T20:45:47Z)
VolcanoML: Speeding up End-to-End AutoML via Scalable Search Space Decomposition [57.06900573003609]
VolcanoML is a framework that decomposes a large AutoML search space into smaller ones. It supports a Volcano-style execution model, akin to the one supported by modern database systems. Our evaluation demonstrates that, not only does VolcanoML raise the level of expressiveness for search space decomposition in AutoML, it also leads to actual findings of decomposition strategies.
arXiv Detail & Related papers (2021-07-19T13:23:57Z)
Designing Machine Learning Pipeline Toolkit for AutoML Surrogate Modeling Optimization [18.82755278152806]
We create the AMLP toolkit which facilitates the creation and evaluation of complex machine learning pipeline structures. We use AMLP to find optimal pipeline signatures, datamine them, and use these datamined features to speed-up learning and prediction. We formulated a two-stage pipeline optimization with surrogate modeling in AMLP which outperforms other AutoML approaches with a 4-hour time budget in less than 5 minutes of AMLP computation time.
arXiv Detail & Related papers (2021-07-02T20:06:40Z)
Automated Evolutionary Approach for the Design of Composite Machine Learning Pipelines [48.7576911714538]
The proposed approach is aimed to automate the design of composite machine learning pipelines. It designs the pipelines with a customizable graph-based structure, analyzes the obtained results, and reproduces them. The software implementation on this approach is presented as an open-source framework.
arXiv Detail & Related papers (2021-06-26T23:19:06Z)
Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL [53.40030379661183]
Auto-PyTorch is a framework to enable fully automated deep learning (AutoDL) It combines multi-fidelity optimization with portfolio construction for warmstarting and ensembling of deep neural networks (DNNs) We show that Auto-PyTorch performs better than several state-of-the-art competitors on average.
arXiv Detail & Related papers (2020-06-24T15:15:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.