Online Sparse Streaming Feature Selection Using Adapted Classification
- URL: http://arxiv.org/abs/2302.14056v1
- Date: Sat, 25 Feb 2023 03:03:53 GMT
- Title: Online Sparse Streaming Feature Selection Using Adapted Classification
- Authors: RuiYang Xu, Di Wu, Xin Luo
- Abstract summary: Existing methods divide features into relevance or irrelevance without missing data.
We propose online sparse streaming feature selection based on adapted classification (OS2FS-AC)
Experimental results on ten real-world data sets demonstrate that OS2FS-AC performs better than state-of-the-art algo-rithms.
- Score: 5.587715545506331
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Traditional feature selections need to know the feature space before
learning, and online streaming feature selection (OSFS) is proposed to process
streaming features on the fly. Existing methods divide features into relevance
or irrelevance without missing data, and deleting irrelevant features may lead
to in-formation loss. Motivated by this, we focus on completing the streaming
feature matrix and division of feature correlation and propose online sparse
streaming feature selection based on adapted classification (OS2FS-AC). This
study uses Latent Factor Analysis (LFA) to pre-estimate missed data. Besides,
we use the adaptive method to obtain the threshold, divide the features into
strongly relevant, weakly relevant, and irrelevant features, and then divide
weak relevance with more information. Experimental results on ten real-world
data sets demonstrate that OS2FS-AC performs better than state-of-the-art
algo-rithms.
Related papers
- Online Sparse Feature Selection in Data Streams via Differential Evolution [2.03725086642376]
This paper introduces a novel Online Differential Evolution for Sparse Feature Selection (ODESFS) in data streams.<n>Experiments conducted on six real-world datasets demonstrate that ODESFS consistently outperforms state-of-the-art OSFS and OS2FS methods.
arXiv Detail & Related papers (2025-11-24T14:19:51Z) - Particle swarm optimization for online sparse streaming feature selection under uncertainty [2.03725086642376]
In real-world applications involving high-dimensional streaming data, online streaming feature selection (OSFS) is widely adopted.<n>This work proposes POS2FS-an uncertainty-aware online sparse streaming feature selection framework enhanced by particle swarm optimization (PSO)<n>The approach introduces: 1) PSO-driven supervision to reduce uncertainty in feature-label relationships; 2) Three-way decision theory to manage feature fuzziness in supervised learning.
arXiv Detail & Related papers (2025-08-24T07:56:41Z) - Disentangling CLIP Features for Enhanced Localized Understanding [58.73850193789384]
We propose Unmix-CLIP, a novel framework designed to reduce mutual feature information (MFI) and improve feature disentanglement.
For the COCO- 14 dataset, Unmix-CLIP reduces feature similarity by 24.9%.
arXiv Detail & Related papers (2025-02-05T08:20:31Z) - Fair Streaming Feature Selection [9.327911386140109]
We propose FairSFS, a novel algorithm for fair streaming feature selection.
We show that FairSFS not only maintains accuracy that is on par with leading streaming feature selection methods but also significantly improves fairness metrics.
arXiv Detail & Related papers (2024-06-20T15:22:44Z) - A Performance-Driven Benchmark for Feature Selection in Tabular Deep
Learning [131.2910403490434]
Data scientists typically collect as many features as possible into their datasets, and even engineer new features from existing ones.
Existing benchmarks for tabular feature selection consider classical downstream models, toy synthetic datasets, or do not evaluate feature selectors on the basis of downstream performance.
We construct a challenging feature selection benchmark evaluated on downstream neural networks including transformers.
We also propose an input-gradient-based analogue of Lasso for neural networks that outperforms classical feature selection methods on challenging problems.
arXiv Detail & Related papers (2023-11-10T05:26:10Z) - Causal Feature Selection via Transfer Entropy [59.999594949050596]
Causal discovery aims to identify causal relationships between features with observational data.
We introduce a new causal feature selection approach that relies on the forward and backward feature selection procedures.
We provide theoretical guarantees on the regression and classification errors for both the exact and the finite-sample cases.
arXiv Detail & Related papers (2023-10-17T08:04:45Z) - An Online Sparse Streaming Feature Selection Algorithm [14.414813893419506]
We propose an online sparse streaming feature selection algorithm with uncertainty (OS2FSU)
OS2FSU consists of two main parts: 1) latent factor analysis is utilized to pre-estimate the missing data in sparse streaming features before con-ducting feature selection, and 2) fuzzy logic and neighborhood rough set are employed to alleviate the uncertainty between estimated streaming features and labels during conducting feature selection.
Results demonstrate that OS2FSU outperforms its competitors when missing data are encountered in OSFS.
arXiv Detail & Related papers (2022-08-02T16:08:22Z) - Parallel feature selection based on the trace ratio criterion [4.30274561163157]
This work presents a novel parallel feature selection approach for classification, namely Parallel Feature Selection using Trace criterion (PFST)
Our method uses trace criterion, a measure of class separability used in Fisher's Discriminant Analysis, to evaluate feature usefulness.
The experiments show that our method can produce a small set of features in a fraction of the amount of time by the other methods under comparison.
arXiv Detail & Related papers (2022-03-03T10:50:33Z) - Learning Debiased and Disentangled Representations for Semantic
Segmentation [52.35766945827972]
We propose a model-agnostic and training scheme for semantic segmentation.
By randomly eliminating certain class information in each training iteration, we effectively reduce feature dependencies among classes.
Models trained with our approach demonstrate strong results on multiple semantic segmentation benchmarks.
arXiv Detail & Related papers (2021-10-31T16:15:09Z) - Efficient Modelling Across Time of Human Actions and Interactions [92.39082696657874]
We argue that current fixed-sized-temporal kernels in 3 convolutional neural networks (CNNDs) can be improved to better deal with temporal variations in the input.
We study how we can better handle between classes of actions, by enhancing their feature differences over different layers of the architecture.
The proposed approaches are evaluated on several benchmark action recognition datasets and show competitive results.
arXiv Detail & Related papers (2021-10-05T15:39:11Z) - Adversarial Feature Augmentation and Normalization for Visual
Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.
Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings.
We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z) - Elastic Net based Feature Ranking and Selection [9.289190508925875]
An intuitive idea is put at the end of multiple times of data splitting and elastic net based feature selection.
It concerns the frequency of selected features and uses the frequency as an indicator of feature importance.
It achieves competitive or superior performance to elastic net and with consistent selection of fewer features.
arXiv Detail & Related papers (2020-12-30T00:08:36Z) - Self-Challenging Improves Cross-Domain Generalization [81.99554996975372]
Convolutional Neural Networks (CNN) conduct image classification by activating dominant features that correlated with labels.
We introduce a simple training, Self-Challenging Representation (RSC), that significantly improves the generalization of CNN to the out-of-domain data.
RSC iteratively challenges the dominant features activated on the training data, and forces the network to activate remaining features that correlates with labels.
arXiv Detail & Related papers (2020-07-05T21:42:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.