Online Feature Screening for Data Streams with Concept Drift
- URL: http://arxiv.org/abs/2104.02883v1
- Date: Wed, 7 Apr 2021 03:16:15 GMT
- Title: Online Feature Screening for Data Streams with Concept Drift
- Authors: Mingyuan Wang, Adrian Barbu
- Abstract summary: This research study focuses on classification datasets.
Our experiments show proposed methods can generate the same feature importance as their offline version with faster speed and less storage consumption.
The results show that online screening methods with integrated model adaptation have a higher true feature detection rate than without model adaptation on data streams with the concept drift property.
- Score: 8.807587076209566
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Screening feature selection methods are often used as a preprocessing step
for reducing the number of variables before training step. Traditional
screening methods only focus on dealing with complete high dimensional
datasets. Modern datasets not only have higher dimension and larger sample
size, but also have properties such as streaming input, sparsity and concept
drift. Therefore a considerable number of online feature selection methods were
introduced to handle these kind of problems in recent years. Online screening
methods are one of the categories of online feature selection methods. The
methods that we proposed in this research are capable of handling all three
situations mentioned above. Our research study focuses on classification
datasets. Our experiments show proposed methods can generate the same feature
importance as their offline version with faster speed and less storage
consumption. Furthermore, the results show that online screening methods with
integrated model adaptation have a higher true feature detection rate than
without model adaptation on data streams with the concept drift property. Among
the two large real datasets that potentially have the concept drift property,
online screening methods with model adaptation show advantages in either saving
computing time and space, reducing model complexity, or improving prediction
accuracy.
Related papers
- RPS: A Generic Reservoir Patterns Sampler [1.09784964592609]
We introduce an approach that harnesses a weighted reservoir to facilitate direct pattern sampling from streaming batch data.
We present a generic algorithm capable of addressing temporal biases and handling various pattern types, including sequential, weighted, and unweighted itemsets.
arXiv Detail & Related papers (2024-10-31T16:25:21Z) - Towards An Online Incremental Approach to Predict Students Performance [0.8287206589886879]
We propose a memory-based online incremental learning approach for updating an online classifier.
Our approach achieves a notable improvement in model accuracy, with an enhancement of nearly 10% compared to the current state-of-the-art.
arXiv Detail & Related papers (2024-05-03T17:13:26Z) - Towards Free Data Selection with General-Purpose Models [71.92151210413374]
A desirable data selection algorithm can efficiently choose the most informative samples to maximize the utility of limited annotation budgets.
Current approaches, represented by active learning methods, typically follow a cumbersome pipeline that iterates the time-consuming model training and batch data selection repeatedly.
FreeSel bypasses the heavy batch selection process, achieving a significant improvement in efficiency and being 530x faster than existing active learning methods.
arXiv Detail & Related papers (2023-09-29T15:50:14Z) - Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching.
Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z) - TempNet: Temporal Attention Towards the Detection of Animal Behaviour in
Videos [63.85815474157357]
We propose an efficient computer vision- and deep learning-based method for the detection of biological behaviours in videos.
TempNet uses an encoder bridge and residual blocks to maintain model performance with a two-staged, spatial, then temporal, encoder.
We demonstrate its application to the detection of sablefish (Anoplopoma fimbria) startle events.
arXiv Detail & Related papers (2022-11-17T23:55:12Z) - Parameter-free Online Test-time Adaptation [19.279048049267388]
We show how test-time adaptation methods fare for a number of pre-trained models on a variety of real-world scenarios.
We propose a particularly "conservative" approach, which addresses the problem with a Laplacian Adjusted Maximum Estimation (LAME)
Our approach exhibits a much higher average accuracy across scenarios than existing methods, while being notably faster and have a much lower memory footprint.
arXiv Detail & Related papers (2022-01-15T00:29:16Z) - Online Feature Selection for Efficient Learning in Networked Systems [3.13468877208035]
Current AI/ML methods for data-driven engineering use models that are mostly trained offline.
We present an online algorithm called Online Stable Feature Set Algorithm (OSFS), which selects a small feature set from a large number of available data sources.
OSFS achieves a massive reduction in the size of the feature set by 1-3 orders of magnitude on all investigated datasets.
arXiv Detail & Related papers (2021-12-15T16:31:59Z) - Online Coreset Selection for Rehearsal-based Continual Learning [65.85595842458882]
In continual learning, we store a subset of training examples (coreset) to be replayed later to alleviate catastrophic forgetting.
We propose Online Coreset Selection (OCS), a simple yet effective method that selects the most representative and informative coreset at each iteration.
Our proposed method maximizes the model's adaptation to a target dataset while selecting high-affinity samples to past tasks, which directly inhibits catastrophic forgetting.
arXiv Detail & Related papers (2021-06-02T11:39:25Z) - Adaptive Deep Forest for Online Learning from Drifting Data Streams [15.49323098362628]
Learning from data streams is among the most vital fields of contemporary data mining.
We propose Adaptive Deep Forest (ADF) - a natural combination of the successful tree-based streaming classifiers with deep forest.
The conducted experiments show that the deep forest approach can be effectively transformed into an online algorithm.
arXiv Detail & Related papers (2020-10-14T18:24:17Z) - Dynamic Scale Training for Object Detection [111.33112051962514]
We propose a Dynamic Scale Training paradigm (abbreviated as DST) to mitigate scale variation challenge in object detection.
Experimental results demonstrate the efficacy of our proposed DST towards scale variation handling.
It does not introduce inference overhead and could serve as a free lunch for general detection configurations.
arXiv Detail & Related papers (2020-04-26T16:48:17Z) - Tracking Performance of Online Stochastic Learners [57.14673504239551]
Online algorithms are popular in large-scale learning settings due to their ability to compute updates on the fly, without the need to store and process data in large batches.
When a constant step-size is used, these algorithms also have the ability to adapt to drifts in problem parameters, such as data or model properties, and track the optimal solution with reasonable accuracy.
We establish a link between steady-state performance derived under stationarity assumptions and the tracking performance of online learners under random walk models.
arXiv Detail & Related papers (2020-04-04T14:16:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.