Classifier Pool Generation based on a Two-level Diversity Approach
- URL: http://arxiv.org/abs/2011.01908v1
- Date: Tue, 3 Nov 2020 18:41:53 GMT
- Title: Classifier Pool Generation based on a Two-level Diversity Approach
- Authors: Marcos Monteiro, Alceu S. Britto Jr, Jean P. Barddal, Luiz S.
Oliveira, Robert Sabourin
- Abstract summary: This paper describes a pool generation method guided by the diversity estimated on the data complexity and decisions.
The complexity measures with high variability across the subsamples are selected for posterior pool adaptation, where an evolutionary algorithm optimize diversity in both complexity and decision spaces.
Results show significant accuracy improvements in 69.4% of the experiments when Dynamic Selection and Dynamic Ensemble Selection methods are applied.
- Score: 14.617208698215808
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper describes a classifier pool generation method guided by the
diversity estimated on the data complexity and classifier decisions. First, the
behavior of complexity measures is assessed by considering several subsamples
of the dataset. The complexity measures with high variability across the
subsamples are selected for posterior pool adaptation, where an evolutionary
algorithm optimizes diversity in both complexity and decision spaces. A robust
experimental protocol with 28 datasets and 20 replications is used to evaluate
the proposed method. Results show significant accuracy improvements in 69.4% of
the experiments when Dynamic Classifier Selection and Dynamic Ensemble
Selection methods are applied.
Related papers
- Transforming Datasets to Requested Complexity with Projection-based Many-Objective Genetic Algorithm [0.0]
This work aims to increase the availability of datasets encompassing a diverse range of problem complexities.<n>For classification, a set of 10 complexity measures was used, while for regression tasks, 4 measures demonstrating promising optimization capabilities were selected.<n>Experiments confirmed that the proposed genetic algorithm can generate datasets with varying levels of difficulty.
arXiv Detail & Related papers (2025-07-20T21:42:30Z) - A High-Dimensional Feature Selection Algorithm Based on Multiobjective Differential Evolution [6.912442653561439]
Multiobjective feature selection seeks to determine the most discriminative feature subset.<n>The proposed method significantly outperforms several state-of-the-art multiobjective feature selection approaches.
arXiv Detail & Related papers (2025-05-09T02:02:49Z) - Ordered Semantically Diverse Sampling for Textual Data [6.280814487955095]
We introduce the ordered diverse sampling problem based on a new metric that measures the diversity in an ordered list of samples.
We present a novel approach for generating ordered diverse samples for textual data that uses principal components on the embedding vectors.
arXiv Detail & Related papers (2025-03-12T06:38:57Z) - Diversified Sampling Improves Scaling LLM inference [31.18762591875725]
DivSampling is a novel and versatile sampling technique designed to enhance the diversity of candidate solutions.
Our theoretical analysis demonstrates that, under mild assumptions, the error rates of responses generated from diverse prompts are significantly lower compared to those produced by stationary prompts.
arXiv Detail & Related papers (2025-02-16T07:37:58Z) - An incremental preference elicitation-based approach to learning potentially non-monotonic preferences in multi-criteria sorting [53.36437745983783]
We first construct a max-margin optimization-based model to model potentially non-monotonic preferences.
We devise information amount measurement methods and question selection strategies to pinpoint the most informative alternative in each iteration.
Two incremental preference elicitation-based algorithms are developed to learn potentially non-monotonic preferences.
arXiv Detail & Related papers (2024-09-04T14:36:20Z) - Variable Selection in Maximum Mean Discrepancy for Interpretable
Distribution Comparison [9.12501922682336]
Two-sample testing decides whether two datasets are generated from the same distribution.
This paper studies variable selection for two-sample testing, the task being to identify the variables responsible for the discrepancies between the two distributions.
arXiv Detail & Related papers (2023-11-02T18:38:39Z) - Finding Support Examples for In-Context Learning [73.90376920653507]
We propose LENS, a fiLter-thEN-Search method to tackle this challenge in two stages.
First we filter the dataset to obtain informative in-context examples individually.
Then we propose diversity-guided example search which iteratively refines and evaluates the selected example permutations.
arXiv Detail & Related papers (2023-02-27T06:32:45Z) - Variable Clustering via Distributionally Robust Nodewise Regression [7.289979396903827]
We study a multi-factor block model for variable clustering and connect it to the regularized subspace clustering by formulating a distributionally robust version of the nodewise regression.
We derive a convex relaxation, provide guidance on selecting the size of the robust region, and hence the regularization weighting parameter, based on the data, and propose an ADMM algorithm for implementation.
arXiv Detail & Related papers (2022-12-15T16:23:25Z) - Towards Automated Imbalanced Learning with Deep Hierarchical
Reinforcement Learning [57.163525407022966]
Imbalanced learning is a fundamental challenge in data mining, where there is a disproportionate ratio of training samples in each class.
Over-sampling is an effective technique to tackle imbalanced learning through generating synthetic samples for the minority class.
We propose AutoSMOTE, an automated over-sampling algorithm that can jointly optimize different levels of decisions.
arXiv Detail & Related papers (2022-08-26T04:28:01Z) - Adaptive Sampling for Heterogeneous Rank Aggregation from Noisy Pairwise
Comparisons [85.5955376526419]
In rank aggregation problems, users exhibit various accuracy levels when comparing pairs of items.
We propose an elimination-based active sampling strategy, which estimates the ranking of items via noisy pairwise comparisons.
We prove that our algorithm can return the true ranking of items with high probability.
arXiv Detail & Related papers (2021-10-08T13:51:55Z) - Clustering-Based Subset Selection in Evolutionary Multiobjective
Optimization [11.110675371854988]
Subset selection is an important component in evolutionary multiobjective optimization (EMO) algorithms.
Clustering-based methods have not been evaluated in the context of subset selection from solution sets obtained by EMO algorithms.
arXiv Detail & Related papers (2021-08-19T02:56:41Z) - Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment.
Policy gradients for local search are often obtained from random perturbations.
We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z) - Greedy Search Algorithms for Unsupervised Variable Selection: A
Comparative Study [3.4888132404740797]
This paper focuses on unsupervised variable selection based dimensionality reduction.
We present a critical evaluation of seven unsupervised greedy variable selection algorithms.
We introduce and evaluate for the first time, a lazy implementation of the variance explained based forward selection component analysis (FSCA) algorithm.
arXiv Detail & Related papers (2021-03-03T21:10:26Z) - Clustering Binary Data by Application of Combinatorial Optimization
Heuristics [52.77024349608834]
We study clustering methods for binary data, first defining aggregation criteria that measure the compactness of clusters.
Five new and original methods are introduced, using neighborhoods and population behavior optimization metaheuristics.
From a set of 16 data tables generated by a quasi-Monte Carlo experiment, a comparison is performed for one of the aggregations using L1 dissimilarity, with hierarchical clustering, and a version of k-means: partitioning around medoids or PAM.
arXiv Detail & Related papers (2020-01-06T23:33:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.