Privileged Zero-Shot AutoML
- URL: http://arxiv.org/abs/2106.13743v1
- Date: Fri, 25 Jun 2021 16:31:05 GMT
- Title: Privileged Zero-Shot AutoML
- Authors: Nikhil Singh, Brandon Kates, Jeff Mentch, Anant Kharkar, Madeleine
Udell, Iddo Drori
- Abstract summary: This work improves the quality of automated machine learning (AutoML) systems by using dataset and function descriptions.
We show that zero-shot AutoML reduces running and prediction times from minutes to milliseconds, consistently across datasets.
- Score: 16.386335031156
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work improves the quality of automated machine learning (AutoML) systems
by using dataset and function descriptions while significantly decreasing
computation time from minutes to milliseconds by using a zero-shot approach.
Given a new dataset and a well-defined machine learning task, humans begin by
reading a description of the dataset and documentation for the algorithms to be
used. This work is the first to use these textual descriptions, which we call
privileged information, for AutoML. We use a pre-trained Transformer model to
process the privileged text and demonstrate that using this information
improves AutoML performance. Thus, our approach leverages the progress of
unsupervised representation learning in natural language processing to provide
a significant boost to AutoML. We demonstrate that using only textual
descriptions of the data and functions achieves reasonable classification
performance, and adding textual descriptions to data meta-features improves
classification across tabular datasets. To achieve zero-shot AutoML we train a
graph neural network with these description embeddings and the data
meta-features. Each node represents a training dataset, which we use to predict
the best machine learning pipeline for a new test dataset in a zero-shot
fashion. Our zero-shot approach rapidly predicts a high-quality pipeline for a
supervised learning task and dataset. In contrast, most AutoML systems require
tens or hundreds of pipeline evaluations. We show that zero-shot AutoML reduces
running and prediction times from minutes to milliseconds, consistently across
datasets. By speeding up AutoML by orders of magnitude this work demonstrates
real-time AutoML.
Related papers
- AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML [56.565200973244146]
Automated machine learning (AutoML) accelerates AI development by automating tasks in the development pipeline.
Recent works have started exploiting large language models (LLM) to lessen such burden.
This paper proposes AutoML-Agent, a novel multi-agent framework tailored for full-pipeline AutoML.
arXiv Detail & Related papers (2024-10-03T20:01:09Z) - The Devil is in the Errors: Leveraging Large Language Models for
Fine-grained Machine Translation Evaluation [93.01964988474755]
AutoMQM is a prompting technique which asks large language models to identify and categorize errors in translations.
We study the impact of labeled data through in-context learning and finetuning.
We then evaluate AutoMQM with PaLM-2 models, and we find that it improves performance compared to just prompting for scores.
arXiv Detail & Related papers (2023-08-14T17:17:21Z) - Large Language Models for Automated Data Science: Introducing CAAFE for
Context-Aware Automated Feature Engineering [52.09178018466104]
We introduce Context-Aware Automated Feature Engineering (CAAFE) to generate semantically meaningful features for datasets.
Despite being methodologically simple, CAAFE improves performance on 11 out of 14 datasets.
We highlight the significance of context-aware solutions that can extend the scope of AutoML systems to semantic AutoML.
arXiv Detail & Related papers (2023-05-05T09:58:40Z) - SubStrat: A Subset-Based Strategy for Faster AutoML [5.833272638548153]
SubStrat is an AutoML optimization strategy that tackles the data size, rather than configuration space.
It wraps existing AutoML tools, and instead of executing them directly on the entire dataset, SubStrat uses a genetic-based algorithm to find a small subset.
It then employs the AutoML tool on the small subset, and finally, it refines the resulted pipeline by executing a restricted, much shorter, AutoML process on the large dataset.
arXiv Detail & Related papers (2022-06-07T07:44:06Z) - Benchmarking Multimodal AutoML for Tabular Data with Text Fields [83.43249184357053]
We assemble 18 multimodal data tables that each contain some text fields.
Our benchmark enables researchers to evaluate their own methods for supervised learning with numeric, categorical, and text features.
arXiv Detail & Related papers (2021-11-04T09:29:16Z) - Automatic Componentwise Boosting: An Interpretable AutoML System [1.1709030738577393]
We propose an AutoML system that constructs an interpretable additive model that can be fitted using a highly scalable componentwise boosting algorithm.
Our system provides tools for easy model interpretation such as visualizing partial effects and pairwise interactions.
Despite its restriction to an interpretable model space, our system is competitive in terms of predictive performance on most data sets.
arXiv Detail & Related papers (2021-09-12T18:34:33Z) - Automated Machine Learning Techniques for Data Streams [91.3755431537592]
This paper surveys the state-of-the-art open-source AutoML tools, applies them to data collected from streams, and measures how their performance changes over time.
The results show that off-the-shelf AutoML tools can provide satisfactory results but in the presence of concept drift, detection or adaptation techniques have to be applied to maintain the predictive accuracy over time.
arXiv Detail & Related papers (2021-06-14T11:42:46Z) - AutoFlow: Learning a Better Training Set for Optical Flow [62.40293188964933]
AutoFlow is a method to render training data for optical flow.
AutoFlow achieves state-of-the-art accuracy in pre-training both PWC-Net and RAFT.
arXiv Detail & Related papers (2021-04-29T17:55:23Z) - Auto-Sklearn 2.0: Hands-free AutoML via Meta-Learning [45.643809726832764]
We introduce new AutoML approaches motivated by our winning submission to the second ChaLearn AutoML challenge.
We develop PoSH Auto-sklearn, which enables AutoML systems to work well on large datasets under rigid time limits.
We also propose a solution towards truly hands-free AutoML.
arXiv Detail & Related papers (2020-07-08T12:41:03Z) - DriveML: An R Package for Driverless Machine Learning [7.004573941239386]
DriveML helps in implementing some of the pillars of an automated machine learning pipeline.
The main benefits of DriveML are in development time savings, reduce developer's errors, optimal tuning of machine learning models and errors.
arXiv Detail & Related papers (2020-05-01T16:40:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.