Efficient AutoML Pipeline Search with Matrix and Tensor Factorization
- URL: http://arxiv.org/abs/2006.04216v1
- Date: Sun, 7 Jun 2020 18:08:48 GMT
- Title: Efficient AutoML Pipeline Search with Matrix and Tensor Factorization
- Authors: Chengrun Yang, Jicong Fan, Ziyang Wu, Madeleine Udell
- Abstract summary: With new pipeline components comes an explosion in the number of choices!
In this work, we design a new AutoML system to address this challenge: an automated system to design a supervised learning pipeline.
Under these models, we develop greedy experiment design protocols to efficiently gather information about a new dataset.
- Score: 41.194759736425176
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data scientists seeking a good supervised learning model on a new dataset
have many choices to make: they must preprocess the data, select features,
possibly reduce the dimension, select an estimation algorithm, and choose
hyperparameters for each of these pipeline components. With new pipeline
components comes a combinatorial explosion in the number of choices! In this
work, we design a new AutoML system to address this challenge: an automated
system to design a supervised learning pipeline. Our system uses matrix and
tensor factorization as surrogate models to model the combinatorial pipeline
search space. Under these models, we develop greedy experiment design protocols
to efficiently gather information about a new dataset. Experiments on large
corpora of real-world classification problems demonstrate the effectiveness of
our approach.
Related papers
- Towards Free Data Selection with General-Purpose Models [71.92151210413374]
A desirable data selection algorithm can efficiently choose the most informative samples to maximize the utility of limited annotation budgets.
Current approaches, represented by active learning methods, typically follow a cumbersome pipeline that iterates the time-consuming model training and batch data selection repeatedly.
FreeSel bypasses the heavy batch selection process, achieving a significant improvement in efficiency and being 530x faster than existing active learning methods.
arXiv Detail & Related papers (2023-09-29T15:50:14Z) - Deep Pipeline Embeddings for AutoML [11.168121941015015]
AutoML is a promising direction for democratizing AI by automatically deploying Machine Learning systems with minimal human expertise.
Existing Pipeline Optimization techniques fail to explore deep interactions between pipeline stages/components.
This paper proposes a novel neural architecture that captures the deep interaction between the components of a Machine Learning pipeline.
arXiv Detail & Related papers (2023-05-23T12:40:38Z) - SapientML: Synthesizing Machine Learning Pipelines by Learning from
Human-Written Solutions [28.718446733713183]
We propose an AutoML SapientML that can learn from a corpus of existing datasets and their human-written pipelines.
We have created a training corpus of 1094 pipelines spanning 170 datasets, and evaluated SapientML on a set of 41 benchmark datasets.
Our evaluation shows that SapientML produces the best or comparable accuracy on 27 of the benchmarks while the second best tool fails to even produce a pipeline on 9 of the instances.
arXiv Detail & Related papers (2022-02-18T20:45:47Z) - Automated Evolutionary Approach for the Design of Composite Machine
Learning Pipelines [48.7576911714538]
The proposed approach is aimed to automate the design of composite machine learning pipelines.
It designs the pipelines with a customizable graph-based structure, analyzes the obtained results, and reproduces them.
The software implementation on this approach is presented as an open-source framework.
arXiv Detail & Related papers (2021-06-26T23:19:06Z) - Efficient Data-specific Model Search for Collaborative Filtering [56.60519991956558]
Collaborative filtering (CF) is a fundamental approach for recommender systems.
In this paper, motivated by the recent advances in automated machine learning (AutoML), we propose to design a data-specific CF model.
Key here is a new framework that unifies state-of-the-art (SOTA) CF methods and splits them into disjoint stages of input encoding, embedding function, interaction and prediction function.
arXiv Detail & Related papers (2021-06-14T14:30:32Z) - Multi-Objective Evolutionary Design of CompositeData-Driven Models [0.0]
The implemented approach is based on a parameter-free genetic algorithm for model design called GPComp@Free.
The experimental results confirm that a multi-objective approach to the model design allows achieving better diversity and quality of obtained models.
arXiv Detail & Related papers (2021-03-01T20:45:24Z) - Incremental Search Space Construction for Machine Learning Pipeline
Synthesis [4.060731229044571]
Automated machine learning (AutoML) aims for constructing machine learning (ML) pipelines automatically.
We propose a data-centric approach based on meta-features for pipeline construction.
We prove the effectiveness and competitiveness of our approach on 28 data sets used in well-established AutoML benchmarks.
arXiv Detail & Related papers (2021-01-26T17:17:49Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Evolution of Scikit-Learn Pipelines with Dynamic Structured Grammatical
Evolution [1.5224436211478214]
This paper describes a novel grammar-based framework that adapts Dynamic Structured Grammatical Evolution (DSGE) to the evolution of Scikit-Learn classification pipelines.
The experimental results include comparing AutoML-DSGE to another grammar-based AutoML framework, Resilient ClassificationPipeline Evolution (RECIPE)
arXiv Detail & Related papers (2020-04-01T09:31:34Z) - Multi-layer Optimizations for End-to-End Data Analytics [71.05611866288196]
We introduce Iterative Functional Aggregate Queries (IFAQ), a framework that realizes an alternative approach.
IFAQ treats the feature extraction query and the learning task as one program given in the IFAQ's domain-specific language.
We show that a Scala implementation of IFAQ can outperform mlpack, Scikit, and specialization by several orders of magnitude for linear regression and regression tree models over several relational datasets.
arXiv Detail & Related papers (2020-01-10T16:14:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.