AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data
- URL: http://arxiv.org/abs/2003.06505v1
- Date: Fri, 13 Mar 2020 23:10:39 GMT
- Title: AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data
- Authors: Nick Erickson, Jonas Mueller, Alexander Shirkov, Hang Zhang, Pedro
Larroy, Mu Li, Alexander Smola
- Abstract summary: We introduce AutoGluon-Tabular, an open-source AutoML framework that requires only a single line of Python to train highly accurate machine learning models.
Tests on a suite of 50 classification and regression tasks from Kaggle and the OpenML AutoML Benchmark reveal that AutoGluon is faster, more robust, and much more accurate.
- Score: 120.2298620652828
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce AutoGluon-Tabular, an open-source AutoML framework that requires
only a single line of Python to train highly accurate machine learning models
on an unprocessed tabular dataset such as a CSV file. Unlike existing AutoML
frameworks that primarily focus on model/hyperparameter selection,
AutoGluon-Tabular succeeds by ensembling multiple models and stacking them in
multiple layers. Experiments reveal that our multi-layer combination of many
models offers better use of allocated training time than seeking out the best.
A second contribution is an extensive evaluation of public and commercial
AutoML platforms including TPOT, H2O, AutoWEKA, auto-sklearn, AutoGluon, and
Google AutoML Tables. Tests on a suite of 50 classification and regression
tasks from Kaggle and the OpenML AutoML Benchmark reveal that AutoGluon is
faster, more robust, and much more accurate. We find that AutoGluon often even
outperforms the best-in-hindsight combination of all of its competitors. In two
popular Kaggle competitions, AutoGluon beat 99% of the participating data
scientists after merely 4h of training on the raw data.
Related papers
- Squeezing Lemons with Hammers: An Evaluation of AutoML and Tabular Deep Learning for Data-Scarce Classification Applications [2.663744975320783]
We find that L2-regularized logistic regression performs similar to state-of-the-art automated machine learning (AutoML) frameworks.
We recommend to consider logistic regression as the first choice for data-scarce applications.
arXiv Detail & Related papers (2024-05-13T11:43:38Z) - AutoGluon-Multimodal (AutoMM): Supercharging Multimodal AutoML with Foundation Models [31.816755598468077]
AutoMM enables fine-tuning of foundation models with just three lines of code.
AutoMM offers a comprehensive suite of functionalities spanning classification, regression, object detection, semantic matching, and image segmentation.
arXiv Detail & Related papers (2024-04-24T22:28:12Z) - AutoGluon-TimeSeries: AutoML for Probabilistic Time Series Forecasting [80.14147131520556]
AutoGluon-TimeSeries is an open-source AutoML library for probabilistic time series forecasting.
It generates accurate point and quantile forecasts with just 3 lines of Python code.
arXiv Detail & Related papers (2023-08-10T13:28:59Z) - Benchmarking Multimodal AutoML for Tabular Data with Text Fields [83.43249184357053]
We assemble 18 multimodal data tables that each contain some text fields.
Our benchmark enables researchers to evaluate their own methods for supervised learning with numeric, categorical, and text features.
arXiv Detail & Related papers (2021-11-04T09:29:16Z) - Can AutoML outperform humans? An evaluation on popular OpenML datasets
using AutoML Benchmark [0.05156484100374058]
This paper compares four AutoML frameworks on 12 different popular datasets from OpenML.
Results show that the automated frameworks perform better or equal than the machine learning community in 7 out of 12 OpenML tasks.
arXiv Detail & Related papers (2020-09-03T10:25:34Z) - Auto-Sklearn 2.0: Hands-free AutoML via Meta-Learning [45.643809726832764]
We introduce new AutoML approaches motivated by our winning submission to the second ChaLearn AutoML challenge.
We develop PoSH Auto-sklearn, which enables AutoML systems to work well on large datasets under rigid time limits.
We also propose a solution towards truly hands-free AutoML.
arXiv Detail & Related papers (2020-07-08T12:41:03Z) - AutoRec: An Automated Recommender System [44.11798716678736]
We present AutoRec, an open-source automated machine learning (AutoML) platform extended from the ecosystem.
AutoRec supports a highly flexible pipeline that accommodates both sparse and dense inputs.
Experiments conducted on the benchmark datasets reveal AutoRec is reliable and can identify models which resemble the best model without prior knowledge.
arXiv Detail & Related papers (2020-06-26T17:04:53Z) - Fast, Accurate, and Simple Models for Tabular Data via Augmented
Distillation [97.42894942391575]
We propose FAST-DAD to distill arbitrarily complex ensemble predictors into individual models like boosted trees, random forests, and deep networks.
Our individual distilled models are over 10x faster and more accurate than ensemble predictors produced by AutoML tools like H2O/AutoSklearn.
arXiv Detail & Related papers (2020-06-25T09:57:47Z) - Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and
Robust AutoDL [53.40030379661183]
Auto-PyTorch is a framework to enable fully automated deep learning (AutoDL)
It combines multi-fidelity optimization with portfolio construction for warmstarting and ensembling of deep neural networks (DNNs)
We show that Auto-PyTorch performs better than several state-of-the-art competitors on average.
arXiv Detail & Related papers (2020-06-24T15:15:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.