Related papers: AMLB: an AutoML Benchmark

AMLB: an AutoML Benchmark

URL: http://arxiv.org/abs/2207.12560v2
Date: Thu, 16 Nov 2023 14:12:10 GMT
Title: AMLB: an AutoML Benchmark
Authors: Pieter Gijsbers, Marcos L. P. Bueno, Stefan Coors, Erin LeDell, S\'ebastien Poirier, Janek Thomas, Bernd Bischl, Joaquin Vanschoren
Abstract summary: We conduct a thorough comparison of 9 well-known AutoML frameworks across 71 classification and 33 regression tasks. The benchmark comes with an open-source tool that integrates with many AutoML frameworks and automates the empirical evaluation process end-to-end.
Score: 9.642136611591578
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Comparing different AutoML frameworks is notoriously challenging and often done incorrectly. We introduce an open and extensible benchmark that follows best practices and avoids common mistakes when comparing AutoML frameworks. We conduct a thorough comparison of 9 well-known AutoML frameworks across 71 classification and 33 regression tasks. The differences between the AutoML frameworks are explored with a multi-faceted analysis, evaluating model accuracy, its trade-offs with inference time, and framework failures. We also use Bradley-Terry trees to discover subsets of tasks where the relative AutoML framework rankings differ. The benchmark comes with an open-source tool that integrates with many AutoML frameworks and automates the empirical evaluation process end-to-end: from framework installation and resource allocation to in-depth evaluation. The benchmark uses public data sets, can be easily extended with other AutoML frameworks and tasks, and has a website with up-to-date results.

Related papers

AutoML Benchmark with shorter time constraints and early stopping [4.3513443677667585]
Automated Machine Learning (AutoML) automatically builds machine learning (ML) models on data. The AutoML Benchmark (AMLB) proposed to evaluate AutoML frameworks using 1- and 4-hour time budgets across 104 tasks. This work considers two ways in which to reduce the overall computation used in the benchmark: smaller time constraints and the use of early stopping.
arXiv Detail & Related papers (2025-04-01T22:22:15Z)
CLOVER: A Test Case Generation Benchmark with Coverage, Long-Context, and Verification [71.34070740261072]
This paper presents a benchmark, CLOVER, to evaluate models' capabilities in generating and completing test cases. The benchmark is containerized for code execution across tasks, and we will release the code, data, and construction methodologies.
arXiv Detail & Related papers (2025-02-12T21:42:56Z)
Re-evaluating Automatic LLM System Ranking for Alignment with Human Preference [63.03859517284341]
An automatic evaluation framework aims to rank LLMs based on their alignment with human preferences. An automatic LLM bencher consists of four components: the input set, the evaluation model, the evaluation type and the aggregation method.
arXiv Detail & Related papers (2024-12-31T17:46:51Z)
AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML [56.565200973244146]
Automated machine learning (AutoML) accelerates AI development by automating tasks in the development pipeline. Recent works have started exploiting large language models (LLM) to lessen such burden. This paper proposes AutoML-Agent, a novel multi-agent framework tailored for full-pipeline AutoML.
arXiv Detail & Related papers (2024-10-03T20:01:09Z)
Task Me Anything [72.810309406219]
This paper produces a benchmark tailored to a user's needs. It contains 113K images, 10K videos, 2K 3D object assets, over 365 object categories, 655 attributes, and 335 relationships. It can generate 750M image/video question-answering pairs, which focus on evaluating perceptual capabilities.
arXiv Detail & Related papers (2024-06-17T17:32:42Z)
Long-Span Question-Answering: Automatic Question Generation and QA-System Ranking via Side-by-Side Evaluation [65.16137964758612]
We explore the use of long-context capabilities in large language models to create synthetic reading comprehension data from entire books. Our objective is to test the capabilities of LLMs to analyze, understand, and reason over problems that require a detailed comprehension of long spans of text.
arXiv Detail & Related papers (2024-05-31T20:15:10Z)
Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM Evaluation [51.99752147380505]
This paper presents a benchmark self-evolving framework to dynamically evaluate Large Language Models (LLMs) We utilize a multi-agent system to manipulate the context or question of original instances, reframing new evolving instances with high confidence. Our framework widens performance discrepancies both between different models and within the same model across various tasks.
arXiv Detail & Related papers (2024-02-18T03:40:06Z)
InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models [50.03163753638256]
Multi-modal Large Language Models (MLLMs) are increasingly prominent in the field of artificial intelligence. Our benchmark comprises three key reasoning categories: deductive, abductive, and analogical reasoning. We evaluate a selection of representative MLLMs using this rigorously developed open-ended multi-step elaborate reasoning benchmark.
arXiv Detail & Related papers (2023-11-20T07:06:31Z)
Bringing Quantum Algorithms to Automated Machine Learning: A Systematic Review of AutoML Frameworks Regarding Extensibility for QML Algorithms [1.4469725791865982]
This work describes the selection approach and analysis of existing AutoML frameworks regarding their capability of incorporating Quantum Machine Learning (QML) algorithms. For that, available open-source tools are condensed into a market overview and suitable frameworks are systematically selected on a multi-phase, multi-criteria approach. We build an extended Automated Quantum Machine Learning (AutoQML) framework with QC-specific pipeline steps and decision characteristics for hardware and software constraints.
arXiv Detail & Related papers (2023-10-06T13:21:16Z)
Automatic Componentwise Boosting: An Interpretable AutoML System [1.1709030738577393]
We propose an AutoML system that constructs an interpretable additive model that can be fitted using a highly scalable componentwise boosting algorithm. Our system provides tools for easy model interpretation such as visualizing partial effects and pairwise interactions. Despite its restriction to an interpretable model space, our system is competitive in terms of predictive performance on most data sets.
arXiv Detail & Related papers (2021-09-12T18:34:33Z)
Comparison of Automated Machine Learning Tools for SMS Spam Message Filtering [0.0]
Short Message Service (SMS) is a popular service used for communication by mobile users. In this work, a classification performance comparison was conducted between three automatic machine learning (AutoML) tools for SMS spam message filtering. Experimental results showed that ensemble models achieved the best classification performance.
arXiv Detail & Related papers (2021-06-16T10:16:07Z)
Can AutoML outperform humans? An evaluation on popular OpenML datasets using AutoML Benchmark [0.05156484100374058]
This paper compares four AutoML frameworks on 12 different popular datasets from OpenML. Results show that the automated frameworks perform better or equal than the machine learning community in 7 out of 12 OpenML tasks.
arXiv Detail & Related papers (2020-09-03T10:25:34Z)
Is deep learning necessary for simple classification tasks? [3.3793659640122717]
Automated machine learning (AutoML) and deep learning (DL) are two cutting-edge paradigms used to solve inductive learning tasks. We compare AutoML and DL in the context of binary classification on 6 well-characterized public datasets. We also evaluate a new tool for genetic programming-based AutoML that incorporates deep estimators.
arXiv Detail & Related papers (2020-06-11T18:41:47Z)
AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data [120.2298620652828]
We introduce AutoGluon-Tabular, an open-source AutoML framework that requires only a single line of Python to train highly accurate machine learning models. Tests on a suite of 50 classification and regression tasks from Kaggle and the OpenML AutoML Benchmark reveal that AutoGluon is faster, more robust, and much more accurate.
arXiv Detail & Related papers (2020-03-13T23:10:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.