ReliefE: Feature Ranking in High-dimensional Spaces via Manifold
Embeddings
- URL: http://arxiv.org/abs/2101.09577v1
- Date: Sat, 23 Jan 2021 20:23:31 GMT
- Title: ReliefE: Feature Ranking in High-dimensional Spaces via Manifold
Embeddings
- Authors: Bla\v{z} \v{S}krlj, Sa\v{s}o D\v{z}eroski, Nada Lavra\v{c} and Matej
Petkovi\'c
- Abstract summary: Relief family of algorithms assign importances to features by iteratively accounting for nearest relevant and irrelevant instances.
Recent embedding-based methods learn compact, low-dimensional representations.
ReliefE algorithm is faster and can result in better feature rankings.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Feature ranking has been widely adopted in machine learning applications such
as high-throughput biology and social sciences. The approaches of the popular
Relief family of algorithms assign importances to features by iteratively
accounting for nearest relevant and irrelevant instances. Despite their high
utility, these algorithms can be computationally expensive and not-well suited
for high-dimensional sparse input spaces. In contrast, recent embedding-based
methods learn compact, low-dimensional representations, potentially
facilitating down-stream learning capabilities of conventional learners. This
paper explores how the Relief branch of algorithms can be adapted to benefit
from (Riemannian) manifold-based embeddings of instance and target spaces,
where a given embedding's dimensionality is intrinsic to the dimensionality of
the considered data set. The developed ReliefE algorithm is faster and can
result in better feature rankings, as shown by our evaluation on 20 real-life
data sets for multi-class and multi-label classification tasks. The utility of
ReliefE for high-dimensional data sets is ensured by its implementation that
utilizes sparse matrix algebraic operations. Finally, the relation of ReliefE
to other ranking algorithms is studied via the Fuzzy Jaccard Index.
Related papers
- Machine Learning Training Optimization using the Barycentric Correction
Procedure [0.0]
This study proposes combining machine learning algorithms with an efficient methodology known as the barycentric correction procedure (BCP)
It was found that this combination provides significant benefits related to time in synthetic and real data without losing accuracy when the number of instances and dimensions increases.
arXiv Detail & Related papers (2024-03-01T13:56:36Z) - Spectral Entry-wise Matrix Estimation for Low-Rank Reinforcement
Learning [53.445068584013896]
We study matrix estimation problems arising in reinforcement learning (RL) with low-rank structure.
In low-rank bandits, the matrix to be recovered specifies the expected arm rewards, and for low-rank Markov Decision Processes (MDPs), it may for example characterize the transition kernel of the MDP.
We show that simple spectral-based matrix estimation approaches efficiently recover the singular subspaces of the matrix and exhibit nearly-minimal entry-wise error.
arXiv Detail & Related papers (2023-10-10T17:06:41Z) - Provably Efficient Representation Learning with Tractable Planning in
Low-Rank POMDP [81.00800920928621]
We study representation learning in partially observable Markov Decision Processes (POMDPs)
We first present an algorithm for decodable POMDPs that combines maximum likelihood estimation (MLE) and optimism in the face of uncertainty (OFU)
We then show how to adapt this algorithm to also work in the broader class of $gamma$-observable POMDPs.
arXiv Detail & Related papers (2023-06-21T16:04:03Z) - Efficient Learning of Minimax Risk Classifiers in High Dimensions [3.093890460224435]
High-dimensional data is common in multiple areas, such as health care and genomics, where the number of features can be tens of thousands.
In this paper, we leverage such methods to obtain an efficient learning algorithm for the recently proposed minimax risk classifiers.
Experiments on multiple high-dimensional datasets show that the proposed algorithm is efficient in high-dimensional scenarios.
arXiv Detail & Related papers (2023-06-11T11:08:20Z) - Learning Structure Aware Deep Spectral Embedding [11.509692423756448]
We propose a novel structure-aware deep spectral embedding by combining a spectral embedding loss and a structure preservation loss.
A deep neural network architecture is proposed that simultaneously encodes both types of information and aims to generate structure-aware spectral embedding.
The proposed algorithm is evaluated on six publicly available real-world datasets.
arXiv Detail & Related papers (2023-05-14T18:18:05Z) - Linearized Wasserstein dimensionality reduction with approximation
guarantees [65.16758672591365]
LOT Wassmap is a computationally feasible algorithm to uncover low-dimensional structures in the Wasserstein space.
We show that LOT Wassmap attains correct embeddings and that the quality improves with increased sample size.
We also show how LOT Wassmap significantly reduces the computational cost when compared to algorithms that depend on pairwise distance computations.
arXiv Detail & Related papers (2023-02-14T22:12:16Z) - Analysis of Self-Supervised Learning and Dimensionality Reduction
Methods in Clustering-Based Active Learning for Speech Emotion Recognition [3.3670613441132984]
We show how to use the structure of the feature space for clustering-based active learning (AL) methods.
In this paper, we combine CPC and multiple dimensionality reduction methods in search of functioning practices for clustering-based AL.
Our experiments for simulating speech emotion recognition system deployment show that both the local and global topology of the feature space can be successfully used for AL.
arXiv Detail & Related papers (2022-06-21T08:44:55Z) - Estimating leverage scores via rank revealing methods and randomization [50.591267188664666]
We study algorithms for estimating the statistical leverage scores of rectangular dense or sparse matrices of arbitrary rank.
Our approach is based on combining rank revealing methods with compositions of dense and sparse randomized dimensionality reduction transforms.
arXiv Detail & Related papers (2021-05-23T19:21:55Z) - Progressively Pretrained Dense Corpus Index for Open-Domain Question
Answering [87.32442219333046]
We propose a simple and resource-efficient method to pretrain the paragraph encoder.
Our method outperforms an existing dense retrieval method that uses 7 times more computational resources for pretraining.
arXiv Detail & Related papers (2020-04-30T18:09:50Z) - New advances in enumerative biclustering algorithms with online
partitioning [80.22629846165306]
This paper further extends RIn-Close_CVC, a biclustering algorithm capable of performing an efficient, complete, correct and non-redundant enumeration of maximal biclusters with constant values on columns in numerical datasets.
The improved algorithm is called RIn-Close_CVC3, keeps those attractive properties of RIn-Close_CVC, and is characterized by: a drastic reduction in memory usage; a consistent gain in runtime.
arXiv Detail & Related papers (2020-03-07T14:54:26Z) - Multi-Objective Genetic Programming for Manifold Learning: Balancing
Quality and Dimensionality [4.4181317696554325]
State-of-the-art manifold learning algorithms are opaque in how they perform this transformation.
We introduce a multi-objective approach that automatically balances the competing objectives of manifold quality and dimensionality.
Our proposed approach is competitive with a range of baseline and state-of-the-art manifold learning methods.
arXiv Detail & Related papers (2020-01-05T23:24:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.