Feature selection in machine learning: R\'enyi min-entropy vs Shannon
entropy
- URL: http://arxiv.org/abs/2001.09654v1
- Date: Mon, 27 Jan 2020 09:50:54 GMT
- Title: Feature selection in machine learning: R\'enyi min-entropy vs Shannon
entropy
- Authors: Catuscia Palamidessi and Marco Romanelli
- Abstract summary: We propose an algorithm based on a notion of conditional R'enyi min-entropy that has been recently adopted in the field of security and privacy.
In practice, however, it seems that the R'enyi-based algorithm tends to outperform the other one.
- Score: 6.434361163743876
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Feature selection, in the context of machine learning, is the process of
separating the highly predictive feature from those that might be irrelevant or
redundant. Information theory has been recognized as a useful concept for this
task, as the prediction power stems from the correlation, i.e., the mutual
information, between features and labels. Many algorithms for feature selection
in the literature have adopted the Shannon-entropy-based mutual information. In
this paper, we explore the possibility of using R\'enyi min-entropy instead. In
particular, we propose an algorithm based on a notion of conditional R\'enyi
min-entropy that has been recently adopted in the field of security and
privacy, and which is strictly related to the Bayes error. We prove that in
general the two approaches are incomparable, in the sense that we show that we
can construct datasets on which the R\'enyi-based algorithm performs better
than the corresponding Shannon-based one, and datasets on which the situation
is reversed. In practice, however, when considering datasets of real data, it
seems that the R\'enyi-based algorithm tends to outperform the other one. We
have effectuate several experiments on the BASEHOCK, SEMEION, and GISETTE
datasets, and in all of them we have indeed observed that the R\'enyi-based
algorithm gives better results.
Related papers
- Surprisal Driven $k$-NN for Robust and Interpretable Nonparametric
Learning [1.4293924404819704]
We shed new light on the traditional nearest neighbors algorithm from the perspective of information theory.
We propose a robust and interpretable framework for tasks such as classification, regression, density estimation, and anomaly detection using a single model.
Our work showcases the architecture's versatility by achieving state-of-the-art results in classification and anomaly detection.
arXiv Detail & Related papers (2023-11-17T00:35:38Z) - Provably Efficient Representation Learning with Tractable Planning in
Low-Rank POMDP [81.00800920928621]
We study representation learning in partially observable Markov Decision Processes (POMDPs)
We first present an algorithm for decodable POMDPs that combines maximum likelihood estimation (MLE) and optimism in the face of uncertainty (OFU)
We then show how to adapt this algorithm to also work in the broader class of $gamma$-observable POMDPs.
arXiv Detail & Related papers (2023-06-21T16:04:03Z) - Performance Evaluation and Comparison of a New Regression Algorithm [4.125187280299247]
We compare the performance of a newly proposed regression algorithm against four conventional machine learning algorithms.
The reader is free to replicate our results since we have provided the source code in a GitHub repository.
arXiv Detail & Related papers (2023-06-15T13:01:16Z) - Learning versus Refutation in Noninteractive Local Differential Privacy [133.80204506727526]
We study two basic statistical tasks in non-interactive local differential privacy (LDP): learning and refutation.
Our main result is a complete characterization of the sample complexity of PAC learning for non-interactive LDP protocols.
arXiv Detail & Related papers (2022-10-26T03:19:24Z) - Automated Algorithm Selection: from Feature-Based to Feature-Free
Approaches [0.5801044612920815]
We propose a novel technique for algorithm-selection, applicable to optimisation in which there is implicit sequential information encapsulated in the data.
We train two types of recurrent neural networks to predict a packing in online bin-packing, selecting from four well-known domains.
arXiv Detail & Related papers (2022-03-24T23:59:50Z) - Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment.
Policy gradients for local search are often obtained from random perturbations.
We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z) - Estimating leverage scores via rank revealing methods and randomization [50.591267188664666]
We study algorithms for estimating the statistical leverage scores of rectangular dense or sparse matrices of arbitrary rank.
Our approach is based on combining rank revealing methods with compositions of dense and sparse randomized dimensionality reduction transforms.
arXiv Detail & Related papers (2021-05-23T19:21:55Z) - Can Active Learning Preemptively Mitigate Fairness Issues? [66.84854430781097]
dataset bias is one of the prevailing causes of unfairness in machine learning.
We study whether models trained with uncertainty-based ALs are fairer in their decisions with respect to a protected class.
We also explore the interaction of algorithmic fairness methods such as gradient reversal (GRAD) and BALD.
arXiv Detail & Related papers (2021-04-14T14:20:22Z) - IFGAN: Missing Value Imputation using Feature-specific Generative
Adversarial Networks [14.714106979097222]
We propose IFGAN, a missing value imputation algorithm based on Feature-specific Generative Adversarial Networks (GAN)
A feature-specific generator is trained to impute missing values, while a discriminator is expected to distinguish the imputed values from observed ones.
We empirically show on several real-life datasets that IFGAN outperforms current state-of-the-art algorithm under various missing conditions.
arXiv Detail & Related papers (2020-12-23T10:14:35Z) - Learning Unbiased Representations via R\'enyi Minimization [13.61565693336172]
We propose an adversarial algorithm to learn unbiased representations via the Hirschfeld-Gebel-Renyi (HGR) maximal correlation coefficient.
We empirically evaluate and compare our approach and demonstrate significant improvements over existing works in the field.
arXiv Detail & Related papers (2020-09-07T15:48:24Z) - FedPD: A Federated Learning Framework with Optimal Rates and Adaptivity
to Non-IID Data [59.50904660420082]
Federated Learning (FL) has become a popular paradigm for learning from distributed data.
To effectively utilize data at different devices without moving them to the cloud, algorithms such as the Federated Averaging (FedAvg) have adopted a "computation then aggregation" (CTA) model.
arXiv Detail & Related papers (2020-05-22T23:07:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.