Feature Selection for Latent Factor Models
- URL: http://arxiv.org/abs/2412.10128v1
- Date: Fri, 13 Dec 2024 13:20:10 GMT
- Title: Feature Selection for Latent Factor Models
- Authors: Rittwika Kansabanik, Adrian Barbu,
- Abstract summary: Feature selection is crucial for pinpointing relevant features in high-dimensional datasets.
Traditional feature selection methods for classification use data from all classes to select features for each class.
This paper explores feature selection methods that select features for each class separately, using class models based on low-rank generative methods.
- Score: 2.07180164747172
- License:
- Abstract: Feature selection is crucial for pinpointing relevant features in high-dimensional datasets, mitigating the 'curse of dimensionality,' and enhancing machine learning performance. Traditional feature selection methods for classification use data from all classes to select features for each class. This paper explores feature selection methods that select features for each class separately, using class models based on low-rank generative methods and introducing a signal-to-noise ratio (SNR) feature selection criterion. This novel approach has theoretical true feature recovery guarantees under certain assumptions and is shown to outperform some existing feature selection methods on standard classification datasets.
Related papers
- Greedy feature selection: Classifier-dependent feature selection via
greedy methods [2.4374097382908477]
The purpose of this study is to introduce a new approach to feature ranking for classification tasks, called in what follows greedy feature selection.
The benefits of such scheme are investigated theoretically in terms of model capacity indicators, such as the Vapnik-Chervonenkis (VC) dimension or the kernel alignment.
arXiv Detail & Related papers (2024-03-08T08:12:05Z) - Feature Selection as Deep Sequential Generative Learning [50.00973409680637]
We develop a deep variational transformer model over a joint of sequential reconstruction, variational, and performance evaluator losses.
Our model can distill feature selection knowledge and learn a continuous embedding space to map feature selection decision sequences into embedding vectors associated with utility scores.
arXiv Detail & Related papers (2024-03-06T16:31:56Z) - Binary Feature Mask Optimization for Feature Selection [0.0]
We introduce a novel framework that selects features considering the outcomes of the model.
We obtain the mask operator using the predictions of the machine learning model.
We demonstrate significant performance improvements on the real-life datasets.
arXiv Detail & Related papers (2024-01-23T10:54:13Z) - A Performance-Driven Benchmark for Feature Selection in Tabular Deep
Learning [131.2910403490434]
Data scientists typically collect as many features as possible into their datasets, and even engineer new features from existing ones.
Existing benchmarks for tabular feature selection consider classical downstream models, toy synthetic datasets, or do not evaluate feature selectors on the basis of downstream performance.
We construct a challenging feature selection benchmark evaluated on downstream neural networks including transformers.
We also propose an input-gradient-based analogue of Lasso for neural networks that outperforms classical feature selection methods on challenging problems.
arXiv Detail & Related papers (2023-11-10T05:26:10Z) - Clustering Indices based Automatic Classification Model Selection [16.096824533334352]
We propose a novel method for automatic classification model selection from a set of candidate model classes.
We compute the dataset clustering indices and directly predict the expected classification performance using the learned regressor.
We also propose an end-to-end Automated ML system for data classification based on our model selection method.
arXiv Detail & Related papers (2023-05-23T10:52:37Z) - Understanding the classes better with class-specific and rule-specific
feature selection, and redundancy control in a fuzzy rule based framework [5.5612170847190665]
We propose a class-specific feature selection method embedded in a fuzzy rule-based classifier.
Our method results in class-specific rules involving class-specific subsets.
The effectiveness of the proposed method has been validated through experiments on three synthetic data sets.
arXiv Detail & Related papers (2022-08-02T07:45:34Z) - Selecting the suitable resampling strategy for imbalanced data
classification regarding dataset properties [62.997667081978825]
In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class.
This situation, known as imbalanced data classification, causes low predictive performance for the minority class examples.
Oversampling and undersampling techniques are well-known strategies to deal with this problem by balancing the number of examples of each class.
arXiv Detail & Related papers (2021-12-15T18:56:39Z) - A Supervised Feature Selection Method For Mixed-Type Data using
Density-based Feature Clustering [1.3048920509133808]
This paper proposes a supervised feature selection method using density-based feature clustering (SFSDFC)
SFSDFC decomposes the feature space into a set of disjoint feature clusters using a novel density-based clustering method.
Then, an effective feature selection strategy is employed to obtain a subset of important features with minimal redundancy from those feature clusters.
arXiv Detail & Related papers (2021-11-10T15:05:15Z) - Multivariate feature ranking of gene expression data [62.997667081978825]
We propose two new multivariate feature ranking methods based on pairwise correlation and pairwise consistency.
We statistically prove that the proposed methods outperform the state of the art feature ranking methods Clustering Variation, Chi Squared, Correlation, Information Gain, ReliefF and Significance.
arXiv Detail & Related papers (2021-11-03T17:19:53Z) - Deep Learning feature selection to unhide demographic recommender
systems factors [63.732639864601914]
The matrix factorization model generates factors which do not incorporate semantic knowledge.
DeepUnHide is able to extract demographic information from the users and items factors in collaborative filtering recommender systems.
arXiv Detail & Related papers (2020-06-17T17:36:48Z) - Selecting Relevant Features from a Multi-domain Representation for
Few-shot Classification [91.67977602992657]
We propose a new strategy based on feature selection, which is both simpler and more effective than previous feature adaptation approaches.
We show that a simple non-parametric classifier built on top of such features produces high accuracy and generalizes to domains never seen during training.
arXiv Detail & Related papers (2020-03-20T15:44:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.