Related papers: Designing Machine Learning Pipeline Toolkit for AutoML Surrogate Modeling Optimization

Designing Machine Learning Pipeline Toolkit for AutoML Surrogate Modeling Optimization

URL: http://arxiv.org/abs/2107.01253v1
Date: Fri, 2 Jul 2021 20:06:40 GMT
Title: Designing Machine Learning Pipeline Toolkit for AutoML Surrogate Modeling Optimization
Authors: Paulito P. Palmes, Akihiro Kishimoto, Radu Marinescu, Parikshit Ram, Elizabeth Daly
Abstract summary: We create the AMLP toolkit which facilitates the creation and evaluation of complex machine learning pipeline structures. We use AMLP to find optimal pipeline signatures, datamine them, and use these datamined features to speed-up learning and prediction. We formulated a two-stage pipeline optimization with surrogate modeling in AMLP which outperforms other AutoML approaches with a 4-hour time budget in less than 5 minutes of AMLP computation time.
Score: 18.82755278152806
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The pipeline optimization problem in machine learning requires simultaneous optimization of pipeline structures and parameter adaptation of their elements. Having an elegant way to express these structures can help lessen the complexity in the management and analysis of their performances together with the different choices of optimization strategies. With these issues in mind, we created the AMLP toolkit which facilitates the creation and evaluation of complex machine learning pipeline structures using simple expressions. We use AMLP to find optimal pipeline signatures, datamine them, and use these datamined features to speed-up learning and prediction. We formulated a two-stage pipeline optimization with surrogate modeling in AMLP which outperforms other AutoML approaches with a 4-hour time budget in less than 5 minutes of AMLP computation time.

Related papers

Pangu Embedded: An Efficient Dual-system LLM Reasoner with Metacognition [95.54406667705999]
Pangu Embedded is an efficient Large Language Model (LLM) reasoner developed on Ascend Neural Processing Units (NPUs)<n>It addresses the significant computational costs and inference latency challenges prevalent in existing reasoning-optimized LLMs.<n>It delivers rapid responses and state-of-the-art reasoning quality within a single, unified model architecture.
arXiv Detail & Related papers (2025-05-28T14:03:02Z)
Efficient Model Selection for Time Series Forecasting via LLMs [52.31535714387368]
We propose to leverage Large Language Models (LLMs) as a lightweight alternative for model selection. Our method eliminates the need for explicit performance matrices by utilizing the inherent knowledge and reasoning capabilities of LLMs.
arXiv Detail & Related papers (2025-04-02T20:33:27Z)
IMPROVE: Iterative Model Pipeline Refinement and Optimization Leveraging LLM Experts [40.98057887166546]
Large language model (LLM) agents have emerged as a promising solution to automate the workflow of machine learning.<n>We introduce Iterative Refinement, a novel strategy for LLM-driven ML pipeline design inspired by how human ML experts iteratively refine models.<n>By systematically updating individual components based on real training feedback, Iterative Refinement improves overall model performance.
arXiv Detail & Related papers (2025-02-25T01:52:37Z)
LLMs for Cold-Start Cutting Plane Separator Configuration [19.931643536607737]
Mixed integer linear programming solvers ship with a staggering number of parameters that are challenging to select a priori for all but expert optimization users. Existing machine learning approaches to configure solvers require training ML models by solving thousands of related MILP instances, generalize poorly to new problem sizes, and often require implementing complex ML pipelines and custom solver interfaces. We present a new LLM-based framework to configure which cutting plane separators to use for a given MILP problem with little to no training data based on characteristics of the instance.
arXiv Detail & Related papers (2024-12-16T18:03:57Z)
SELA: Tree-Search Enhanced LLM Agents for Automated Machine Learning [14.702694298483445]
Tree-Search Enhanced LLM Agents (SELA) is an agent-based system that leverages Monte Carlo Tree Search (MCTS) to optimize the AutoML process. SELA represents pipeline configurations as trees, enabling agents to conduct experiments intelligently and iteratively refine their strategies. In an extensive evaluation across 20 machine learning datasets, we compare the performance of traditional and agent-based AutoML methods.
arXiv Detail & Related papers (2024-10-22T17:56:08Z)
LLM-based Optimization of Compound AI Systems: A Survey [64.39860384538338]
In a compound AI system, components such as an LLM call, a retriever, a code interpreter, or tools are interconnected. Recent advancements enable end-to-end optimization of these parameters using an LLM. This paper presents a survey of the principles and emerging trends in LLM-based optimization of compound AI systems.
arXiv Detail & Related papers (2024-10-21T18:06:25Z)
AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML [56.565200973244146]
Automated machine learning (AutoML) accelerates AI development by automating tasks in the development pipeline. Recent works have started exploiting large language models (LLM) to lessen such burden. This paper proposes AutoML-Agent, a novel multi-agent framework tailored for full-pipeline AutoML.
arXiv Detail & Related papers (2024-10-03T20:01:09Z)
Two Optimizers Are Better Than One: LLM Catalyst Empowers Gradient-Based Optimization for Prompt Tuning [69.95292905263393]
We show that gradient-based optimization and large language models (MsLL) are complementary to each other, suggesting a collaborative optimization approach. Our code is released at https://www.guozix.com/guozix/LLM-catalyst.
arXiv Detail & Related papers (2024-05-30T06:24:14Z)
Unleashing the Potential of Large Language Models as Prompt Optimizers: An Analogical Analysis with Gradient-based Model Optimizers [108.72225067368592]
We propose a novel perspective to investigate the design of large language models (LLMs)-based prompts. We identify two pivotal factors in model parameter learning: update direction and update method. In particular, we borrow the theoretical framework and learning methods from gradient-based optimization to design improved strategies.
arXiv Detail & Related papers (2024-02-27T15:05:32Z)
eTOP: Early Termination of Pipelines for Faster Training of AutoML Systems [12.933957727351666]
Finding the right AI/ML model is a complex and costly process. We propose eTOP Framework which works on top of any AutoML system.
arXiv Detail & Related papers (2023-04-17T20:22:30Z)
AutoEn: An AutoML method based on ensembles of predefined Machine Learning pipelines for supervised Traffic Forecasting [1.6242924916178283]
Traffic Forecasting (TF) is gaining relevance due to its ability to mitigate traffic congestion by forecasting future traffic states. TF poses one big challenge to the Machine Learning paradigm, known as the Model Selection Problem (MSP) We introduce AutoEn, which is a simple and efficient method for automatically generating multi-classifier ensembles from a predefined set of ML pipelines.
arXiv Detail & Related papers (2023-03-19T18:37:18Z)
VeLO: Training Versatile Learned Optimizers by Scaling Up [67.90237498659397]
We leverage the same scaling approach behind the success of deep learning to learn versatiles. We train an ingest for deep learning which is itself a small neural network that ingests and outputs parameter updates. We open source our learned, meta-training code, the associated train test data, and an extensive benchmark suite with baselines at velo-code.io.
arXiv Detail & Related papers (2022-11-17T18:39:07Z)
Automated Evolutionary Approach for the Design of Composite Machine Learning Pipelines [48.7576911714538]
The proposed approach is aimed to automate the design of composite machine learning pipelines. It designs the pipelines with a customizable graph-based structure, analyzes the obtained results, and reproduces them. The software implementation on this approach is presented as an open-source framework.
arXiv Detail & Related papers (2021-06-26T23:19:06Z)
Incremental Search Space Construction for Machine Learning Pipeline Synthesis [4.060731229044571]
Automated machine learning (AutoML) aims for constructing machine learning (ML) pipelines automatically. We propose a data-centric approach based on meta-features for pipeline construction. We prove the effectiveness and competitiveness of our approach on 28 data sets used in well-established AutoML benchmarks.
arXiv Detail & Related papers (2021-01-26T17:17:49Z)
AutoWeka4MCPS-AVATAR: Accelerating Automated Machine Learning Pipeline Composition and Optimisation [13.116806430326513]
We propose a novel method to evaluate the validity of ML pipelines, without their execution, using a surrogate model (AVATAR) The AVATAR generates a knowledge base by automatically learning the capabilities and effects of ML algorithms on datasets' characteristics. Instead of executing the original ML pipeline to evaluate its validity, the AVATAR evaluates its surrogate model constructed by capabilities and effects of the ML pipeline components.
arXiv Detail & Related papers (2020-11-21T14:05:49Z)
Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL [53.40030379661183]
Auto-PyTorch is a framework to enable fully automated deep learning (AutoDL) It combines multi-fidelity optimization with portfolio construction for warmstarting and ensembling of deep neural networks (DNNs) We show that Auto-PyTorch performs better than several state-of-the-art competitors on average.
arXiv Detail & Related papers (2020-06-24T15:15:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.