Online Sparse Feature Selection in Data Streams via Differential Evolution
- URL: http://arxiv.org/abs/2511.19555v1
- Date: Mon, 24 Nov 2025 14:19:51 GMT
- Title: Online Sparse Feature Selection in Data Streams via Differential Evolution
- Authors: Ruiyang Xu,
- Abstract summary: This paper introduces a novel Online Differential Evolution for Sparse Feature Selection (ODESFS) in data streams.<n>Experiments conducted on six real-world datasets demonstrate that ODESFS consistently outperforms state-of-the-art OSFS and OS2FS methods.
- Score: 2.03725086642376
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The processing of high-dimensional streaming data commonly utilizes online streaming feature selection (OSFS) techniques. However, practical implementations often face challenges with data incompleteness due to equipment failures and technical constraints. Online Sparse Streaming Feature Selection (OS2FS) tackles this issue through latent factor analysis-based missing data imputation. Despite this advancement, existing OS2FS approaches exhibit substantial limitations in feature evaluation, resulting in performance deterioration. To address these shortcomings, this paper introduces a novel Online Differential Evolution for Sparse Feature Selection (ODESFS) in data streams, incorporating two key innovations: (1) missing value imputation using a latent factor analysis model, and (2) feature importance evaluation through differential evolution. Comprehensive experiments conducted on six real-world datasets demonstrate that ODESFS consistently outperforms state-of-the-art OSFS and OS2FS methods by selecting optimal feature subsets and achieving superior accuracy.
Related papers
- Unsupervised Feature Selection via Robust Autoencoder and Adaptive Graph Learning [4.371255245299254]
Unsupervised feature selection (UFS) aims to simultaneously cluster data and identify the most discriminative features.<n>We propose the Robust Autoencoder-based Unsupervised Feature Selection (RAEUFS) model, which leverages a deep autoencoder to learn nonlinear feature representations.<n>Our method outperforms state-of-the-art UFS approaches in both clean and outlier-contaminated data settings.
arXiv Detail & Related papers (2025-12-21T12:42:37Z) - Long Grounded Thoughts: Distilling Compositional Visual Reasoning Chains at Scale [70.23466957404891]
We introduce a new reasoning data generation framework spanning diverse skills and levels of complexity with over 1M high-quality synthetic vision-centric questions.<n>We show that finetuning Qwen2.5-VL-7B on our data outperforms all open-data baselines across all evaluated vision-centric benchmarks.
arXiv Detail & Related papers (2025-11-07T20:50:54Z) - Particle swarm optimization for online sparse streaming feature selection under uncertainty [2.03725086642376]
In real-world applications involving high-dimensional streaming data, online streaming feature selection (OSFS) is widely adopted.<n>This work proposes POS2FS-an uncertainty-aware online sparse streaming feature selection framework enhanced by particle swarm optimization (PSO)<n>The approach introduces: 1) PSO-driven supervision to reduce uncertainty in feature-label relationships; 2) Three-way decision theory to manage feature fuzziness in supervised learning.
arXiv Detail & Related papers (2025-08-24T07:56:41Z) - Online Decision-Focused Learning [74.3205104323777]
Decision-focused learning (DFL) is an increasingly popular paradigm for training models whose predictive outputs are used in decision-making tasks.<n>In this paper, we regularize the objective function to make it different and investigate how to overcome nonoptimality function.<n>We also showcase the effectiveness of our algorithms on a knapsack experiment, where they outperform two standard benchmarks.
arXiv Detail & Related papers (2025-05-19T10:40:30Z) - Improving Sampling Methods for Fine-tuning SentenceBERT in Text Streams [49.3179290313959]
This study explores the efficacy of seven text sampling methods designed to selectively fine-tune language models.
We precisely assess the impact of these methods on fine-tuning the SBERT model using four different loss functions.
Our findings indicate that Softmax loss and Batch All Triplets loss are particularly effective for text stream classification.
arXiv Detail & Related papers (2024-03-18T23:41:52Z) - Robust Learning with Progressive Data Expansion Against Spurious
Correlation [65.83104529677234]
We study the learning process of a two-layer nonlinear convolutional neural network in the presence of spurious features.
Our analysis suggests that imbalanced data groups and easily learnable spurious features can lead to the dominance of spurious features during the learning process.
We propose a new training algorithm called PDE that efficiently enhances the model's robustness for a better worst-group performance.
arXiv Detail & Related papers (2023-06-08T05:44:06Z) - Online Sparse Streaming Feature Selection Using Adapted Classification [5.587715545506331]
Existing methods divide features into relevance or irrelevance without missing data.
We propose online sparse streaming feature selection based on adapted classification (OS2FS-AC)
Experimental results on ten real-world data sets demonstrate that OS2FS-AC performs better than state-of-the-art algo-rithms.
arXiv Detail & Related papers (2023-02-25T03:03:53Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - An Online Sparse Streaming Feature Selection Algorithm [14.414813893419506]
We propose an online sparse streaming feature selection algorithm with uncertainty (OS2FSU)
OS2FSU consists of two main parts: 1) latent factor analysis is utilized to pre-estimate the missing data in sparse streaming features before con-ducting feature selection, and 2) fuzzy logic and neighborhood rough set are employed to alleviate the uncertainty between estimated streaming features and labels during conducting feature selection.
Results demonstrate that OS2FSU outperforms its competitors when missing data are encountered in OSFS.
arXiv Detail & Related papers (2022-08-02T16:08:22Z) - Quasi-Global Momentum: Accelerating Decentralized Deep Learning on
Heterogeneous Data [77.88594632644347]
Decentralized training of deep learning models is a key element for enabling data privacy and on-device learning over networks.
In realistic learning scenarios, the presence of heterogeneity across different clients' local datasets poses an optimization challenge.
We propose a novel momentum-based method to mitigate this decentralized training difficulty.
arXiv Detail & Related papers (2021-02-09T11:27:14Z) - Feature Selection Based on Sparse Neural Network Layer with Normalizing
Constraints [0.0]
We propose new neural-network based feature selection approach that introduces two constrains, the satisfying of which leads to sparse FS layer.
The results confirm that proposed Feature Selection Based on Sparse Neural Network Layer with Normalizing Constraints (SNEL-FS) is able to select the important features and yields superior performance compared to other conventional FS methods.
arXiv Detail & Related papers (2020-12-11T14:14:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.