Exploratory Landscape Analysis is Strongly Sensitive to the Sampling
Strategy
- URL: http://arxiv.org/abs/2006.11135v1
- Date: Fri, 19 Jun 2020 13:45:13 GMT
- Title: Exploratory Landscape Analysis is Strongly Sensitive to the Sampling
Strategy
- Authors: Quentin Renau, Carola Doerr, Johann Dreo, Benjamin Doerr
- Abstract summary: In black-box optimization, where an explicit problem representation is not available, the feature values need to be approximated from a small number of sample points.
In this work, we analyze how the sampling method and the sample size influence the quality of the feature value approximations.
- Score: 8.246980996934347
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Exploratory landscape analysis (ELA) supports supervised learning approaches
for automated algorithm selection and configuration by providing sets of
features that quantify the most relevant characteristics of the optimization
problem at hand. In black-box optimization, where an explicit problem
representation is not available, the feature values need to be approximated
from a small number of sample points. In practice, uniformly sampled random
point sets and Latin hypercube constructions are commonly used sampling
strategies. In this work, we analyze how the sampling method and the sample
size influence the quality of the feature value approximations and how this
quality impacts the accuracy of a standard classification task. While, not
unexpectedly, increasing the number of sample points gives more robust
estimates for the feature values, to our surprise we find that the feature
value approximations for different sampling strategies do not converge to the
same value. This implies that approximated feature values cannot be interpreted
independently of the underlying sampling strategy. As our classification
experiments show, this also implies that the feature approximations used for
training a classifier must stem from the same sampling strategy as those used
for the actual classification tasks. As a side result we show that classifiers
trained with feature values approximated by Sobol' sequences achieve higher
accuracy than any of the standard sampling techniques. This may indicate
improvement potential for ELA-trained machine learning models.
Related papers
- Differentiating Policies for Non-Myopic Bayesian Optimization [5.793371273485735]
We show how to efficiently estimate rollout functions and their gradient, enabling sampling policies.
In this paper, we show how to efficiently estimate rollout functions and their gradient, enabling sampling policies.
arXiv Detail & Related papers (2024-08-14T21:00:58Z) - Revisiting Score Function Estimators for $k$-Subset Sampling [5.464421236280698]
We show how to efficiently compute the $k$-subset distribution's score function using a discrete Fourier transform.
The resulting estimator provides both exact samples and unbiased gradient estimates.
Experiments in feature selection show results competitive with current methods, despite weaker assumptions.
arXiv Detail & Related papers (2024-07-22T21:26:39Z) - Dataset Quantization with Active Learning based Adaptive Sampling [11.157462442942775]
We show that maintaining performance is feasible even with uneven sample distributions.
We propose a novel active learning based adaptive sampling strategy to optimize the sample selection.
Our approach outperforms the state-of-the-art dataset compression methods.
arXiv Detail & Related papers (2024-07-09T23:09:18Z) - Gradient and Uncertainty Enhanced Sequential Sampling for Global Fit [0.0]
This paper proposes a new sampling strategy for global fit called Gradient and Uncertainty Enhanced Sequential Sampling (GUESS)
We show that GUESS achieved on average the highest sample efficiency compared to other surrogate-based strategies on the tested examples.
arXiv Detail & Related papers (2023-09-29T19:49:39Z) - Towards Automated Imbalanced Learning with Deep Hierarchical
Reinforcement Learning [57.163525407022966]
Imbalanced learning is a fundamental challenge in data mining, where there is a disproportionate ratio of training samples in each class.
Over-sampling is an effective technique to tackle imbalanced learning through generating synthetic samples for the minority class.
We propose AutoSMOTE, an automated over-sampling algorithm that can jointly optimize different levels of decisions.
arXiv Detail & Related papers (2022-08-26T04:28:01Z) - An Additive Instance-Wise Approach to Multi-class Model Interpretation [53.87578024052922]
Interpretable machine learning offers insights into what factors drive a certain prediction of a black-box system.
Existing methods mainly focus on selecting explanatory input features, which follow either locally additive or instance-wise approaches.
This work exploits the strengths of both methods and proposes a global framework for learning local explanations simultaneously for multiple target classes.
arXiv Detail & Related papers (2022-07-07T06:50:27Z) - Selecting the suitable resampling strategy for imbalanced data
classification regarding dataset properties [62.997667081978825]
In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class.
This situation, known as imbalanced data classification, causes low predictive performance for the minority class examples.
Oversampling and undersampling techniques are well-known strategies to deal with this problem by balancing the number of examples of each class.
arXiv Detail & Related papers (2021-12-15T18:56:39Z) - Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment.
Policy gradients for local search are often obtained from random perturbations.
We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z) - Dynamic Instance-Wise Classification in Correlated Feature Spaces [15.351282873821935]
In a typical machine learning setting, the predictions on all test instances are based on a common subset of features discovered during model training.
A new method is proposed that sequentially selects the best feature to evaluate for each test instance individually, and stops the selection process to make a prediction once it determines that no further improvement can be achieved with respect to classification accuracy.
The effectiveness, generalizability, and scalability of the proposed method is illustrated on a variety of real-world datasets from diverse application domains.
arXiv Detail & Related papers (2021-06-08T20:20:36Z) - Optimal Importance Sampling for Federated Learning [57.14673504239551]
Federated learning involves a mixture of centralized and decentralized processing tasks.
The sampling of both agents and data is generally uniform; however, in this work we consider non-uniform sampling.
We derive optimal importance sampling strategies for both agent and data selection and show that non-uniform sampling without replacement improves the performance of the original FedAvg algorithm.
arXiv Detail & Related papers (2020-10-26T14:15:33Z) - Learning a Unified Sample Weighting Network for Object Detection [113.98404690619982]
Region sampling or weighting is significantly important to the success of modern region-based object detectors.
We argue that sample weighting should be data-dependent and task-dependent.
We propose a unified sample weighting network to predict a sample's task weights.
arXiv Detail & Related papers (2020-06-11T16:19:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.