On Taking Advantage of Opportunistic Meta-knowledge to Reduce
Configuration Spaces for Automated Machine Learning
- URL: http://arxiv.org/abs/2208.04376v1
- Date: Mon, 8 Aug 2022 19:22:24 GMT
- Title: On Taking Advantage of Opportunistic Meta-knowledge to Reduce
Configuration Spaces for Automated Machine Learning
- Authors: David Jacob Kedziora, Tien-Dung Nguyen, Katarzyna Musial, Bogdan
Gabrys
- Abstract summary: Key research question is whether it is possible and practical to preemptively avoid costly evaluations of poorly performing ML pipelines.
Numerous experiments with the AutoWeka4MCPS package suggest that opportunistic/systematic meta-knowledge can improve ML outcomes.
We observe strong sensitivity to the challenge' of a dataset, i.e. whether specificity in choosing a predictor leads to significantly better performance.
- Score: 11.670797168818773
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The automated machine learning (AutoML) process can require searching through
complex configuration spaces of not only machine learning (ML) components and
their hyperparameters but also ways of composing them together, i.e. forming ML
pipelines. Optimisation efficiency and the model accuracy attainable for a
fixed time budget suffer if this pipeline configuration space is excessively
large. A key research question is whether it is both possible and practical to
preemptively avoid costly evaluations of poorly performing ML pipelines by
leveraging their historical performance for various ML tasks, i.e.
meta-knowledge. The previous experience comes in the form of
classifier/regressor accuracy rankings derived from either (1) a substantial
but non-exhaustive number of pipeline evaluations made during historical AutoML
runs, i.e. 'opportunistic' meta-knowledge, or (2) comprehensive cross-validated
evaluations of classifiers/regressors with default hyperparameters, i.e.
'systematic' meta-knowledge. Numerous experiments with the AutoWeka4MCPS
package suggest that (1) opportunistic/systematic meta-knowledge can improve ML
outcomes, typically in line with how relevant that meta-knowledge is, and (2)
configuration-space culling is optimal when it is neither too conservative nor
too radical. However, the utility and impact of meta-knowledge depend
critically on numerous facets of its generation and exploitation, warranting
extensive analysis; these are often overlooked/underappreciated within AutoML
and meta-learning literature. In particular, we observe strong sensitivity to
the `challenge' of a dataset, i.e. whether specificity in choosing a predictor
leads to significantly better performance. Ultimately, identifying `difficult'
datasets, thus defined, is crucial to both generating informative
meta-knowledge bases and understanding optimal search-space reduction
strategies.
Related papers
- What Do Learning Dynamics Reveal About Generalization in LLM Reasoning? [83.83230167222852]
We find that a model's generalization behavior can be effectively characterized by a training metric we call pre-memorization train accuracy.
By connecting a model's learning behavior to its generalization, pre-memorization train accuracy can guide targeted improvements to training strategies.
arXiv Detail & Related papers (2024-11-12T09:52:40Z) - MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [55.20845457594977]
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making.
We present a process-based benchmark MR-Ben that demands a meta-reasoning skill.
Our meta-reasoning paradigm is especially suited for system-2 slow thinking.
arXiv Detail & Related papers (2024-06-20T03:50:23Z) - Maximize to Explore: One Objective Function Fusing Estimation, Planning,
and Exploration [87.53543137162488]
We propose an easy-to-implement online reinforcement learning (online RL) framework called textttMEX.
textttMEX integrates estimation and planning components while balancing exploration exploitation automatically.
It can outperform baselines by a stable margin in various MuJoCo environments with sparse rewards.
arXiv Detail & Related papers (2023-05-29T17:25:26Z) - ezDPS: An Efficient and Zero-Knowledge Machine Learning Inference
Pipeline [2.0813318162800707]
We propose ezDPS, a new efficient and zero-knowledge Machine Learning inference scheme.
ezDPS is a zkML pipeline in which the data is processed in multiple stages for high accuracy.
We show that ezDPS achieves one-to-three orders of magnitude more efficient than the generic circuit-based approach in all metrics.
arXiv Detail & Related papers (2022-12-11T06:47:28Z) - MARS: Meta-Learning as Score Matching in the Function Space [79.73213540203389]
We present a novel approach to extracting inductive biases from a set of related datasets.
We use functional Bayesian neural network inference, which views the prior as a process and performs inference in the function space.
Our approach can seamlessly acquire and represent complex prior knowledge by metalearning the score function of the data-generating process.
arXiv Detail & Related papers (2022-10-24T15:14:26Z) - STREAMLINE: A Simple, Transparent, End-To-End Automated Machine Learning
Pipeline Facilitating Data Analysis and Algorithm Comparison [0.49034553215430216]
STREAMLINE is a simple, transparent, end-to-end AutoML pipeline.
It is specifically designed to compare performance between datasets, ML algorithms, and other AutoML tools.
arXiv Detail & Related papers (2022-06-23T22:40:58Z) - Exploring Opportunistic Meta-knowledge to Reduce Search Spaces for
Automated Machine Learning [8.325359814939517]
This paper investigates whether, based on previous experience, a pool of available classifiers/regressors can be preemptively culled ahead of initiating a pipeline composition/optimisation process.
arXiv Detail & Related papers (2021-05-01T15:25:30Z) - Robusta: Robust AutoML for Feature Selection via Reinforcement Learning [24.24652530951966]
We propose the first robust AutoML framework, Robusta--based on reinforcement learning (RL)
We show that the framework is able to improve the model robustness by up to 22% while maintaining competitive accuracy on benign samples.
arXiv Detail & Related papers (2021-01-15T03:12:29Z) - Fast Few-Shot Classification by Few-Iteration Meta-Learning [173.32497326674775]
We introduce a fast optimization-based meta-learning method for few-shot classification.
Our strategy enables important aspects of the base learner objective to be learned during meta-training.
We perform a comprehensive experimental analysis, demonstrating the speed and effectiveness of our approach.
arXiv Detail & Related papers (2020-10-01T15:59:31Z) - A Rigorous Machine Learning Analysis Pipeline for Biomedical Binary
Classification: Application in Pancreatic Cancer Nested Case-control Studies
with Implications for Bias Assessments [2.9726886415710276]
We have laid out and assembled a complete, rigorous ML analysis pipeline focused on binary classification.
This 'automated' but customizable pipeline includes a) exploratory analysis, b) data cleaning and transformation, c) feature selection, d) model training with 9 established ML algorithms.
We apply this pipeline to an epidemiological investigation of established and newly identified risk factors for cancer to evaluate how different sources of bias might be handled by ML algorithms.
arXiv Detail & Related papers (2020-08-28T19:58:05Z) - A Survey on Large-scale Machine Learning [67.6997613600942]
Machine learning can provide deep insights into data, allowing machines to make high-quality predictions.
Most sophisticated machine learning approaches suffer from huge time costs when operating on large-scale data.
Large-scale Machine Learning aims to learn patterns from big data with comparable performance efficiently.
arXiv Detail & Related papers (2020-08-10T06:07:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.