Online Feature Selection for Efficient Learning in Networked Systems
- URL: http://arxiv.org/abs/2112.08253v1
- Date: Wed, 15 Dec 2021 16:31:59 GMT
- Title: Online Feature Selection for Efficient Learning in Networked Systems
- Authors: Xiaoxuan Wang, Rolf Stadler
- Abstract summary: Current AI/ML methods for data-driven engineering use models that are mostly trained offline.
We present an online algorithm called Online Stable Feature Set Algorithm (OSFS), which selects a small feature set from a large number of available data sources.
OSFS achieves a massive reduction in the size of the feature set by 1-3 orders of magnitude on all investigated datasets.
- Score: 3.13468877208035
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Current AI/ML methods for data-driven engineering use models that are mostly
trained offline. Such models can be expensive to build in terms of
communication and computing cost, and they rely on data that is collected over
extended periods of time. Further, they become out-of-date when changes in the
system occur. To address these challenges, we investigate online learning
techniques that automatically reduce the number of available data sources for
model training. We present an online algorithm called Online Stable Feature Set
Algorithm (OSFS), which selects a small feature set from a large number of
available data sources after receiving a small number of measurements. The
algorithm is initialized with a feature ranking algorithm, a feature set
stability metric, and a search policy. We perform an extensive experimental
evaluation of this algorithm using traces from an in-house testbed and from a
data center in operation. We find that OSFS achieves a massive reduction in the
size of the feature set by 1-3 orders of magnitude on all investigated
datasets. Most importantly, we find that the accuracy of a predictor trained on
a OSFS-produced feature set is somewhat better than when the predictor is
trained on a feature set obtained through offline feature selection. OSFS is
thus shown to be effective as an online feature selection algorithm and robust
regarding the sample interval used for feature selection. We also find that,
when concept drift in the data underlying the model occurs, its effect can be
mitigated by recomputing the feature set and retraining the prediction model.
Related papers
- Attribute-to-Delete: Machine Unlearning via Datamodel Matching [65.13151619119782]
Machine unlearning -- efficiently removing a small "forget set" training data on a pre-divertrained machine learning model -- has recently attracted interest.
Recent research shows that machine unlearning techniques do not hold up in such a challenging setting.
arXiv Detail & Related papers (2024-10-30T17:20:10Z) - Unveiling the Power of Sparse Neural Networks for Feature Selection [60.50319755984697]
Sparse Neural Networks (SNNs) have emerged as powerful tools for efficient feature selection.
We show that SNNs trained with dynamic sparse training (DST) algorithms can achieve, on average, more than $50%$ memory and $55%$ FLOPs reduction.
Our findings show that feature selection with SNNs trained with DST algorithms can achieve, on average, more than $50%$ memory and $55%$ FLOPs reduction.
arXiv Detail & Related papers (2024-08-08T16:48:33Z) - LESS: Selecting Influential Data for Targeted Instruction Tuning [64.78894228923619]
We propose LESS, an efficient algorithm to estimate data influences and perform Low-rank gradiEnt Similarity Search for instruction data selection.
We show that training on a LESS-selected 5% of the data can often outperform training on the full dataset across diverse downstream tasks.
Our method goes beyond surface form cues to identify data that the necessary reasoning skills for the intended downstream application.
arXiv Detail & Related papers (2024-02-06T19:18:04Z) - Towards Free Data Selection with General-Purpose Models [71.92151210413374]
A desirable data selection algorithm can efficiently choose the most informative samples to maximize the utility of limited annotation budgets.
Current approaches, represented by active learning methods, typically follow a cumbersome pipeline that iterates the time-consuming model training and batch data selection repeatedly.
FreeSel bypasses the heavy batch selection process, achieving a significant improvement in efficiency and being 530x faster than existing active learning methods.
arXiv Detail & Related papers (2023-09-29T15:50:14Z) - Compactness Score: A Fast Filter Method for Unsupervised Feature
Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features.
Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z) - Federated Feature Selection for Cyber-Physical Systems of Systems [0.3609538870261841]
A fleet of autonomous vehicles finds a consensus on the optimal set of features that they exploit to reduce data transmission up to 99% with negligible information loss.
Our results show that a fleet of autonomous vehicles finds a consensus on the optimal set of features that they exploit to reduce data transmission up to 99% with negligible information loss.
arXiv Detail & Related papers (2021-09-23T12:16:50Z) - Online Feature Screening for Data Streams with Concept Drift [8.807587076209566]
This research study focuses on classification datasets.
Our experiments show proposed methods can generate the same feature importance as their offline version with faster speed and less storage consumption.
The results show that online screening methods with integrated model adaptation have a higher true feature detection rate than without model adaptation on data streams with the concept drift property.
arXiv Detail & Related papers (2021-04-07T03:16:15Z) - Feature Selection Using Reinforcement Learning [0.0]
The space of variables or features that can be used to characterize a particular predictor of interest continues to grow exponentially.
Identifying the most characterizing features that minimizes the variance without jeopardizing the bias of our models is critical to successfully training a machine learning model.
arXiv Detail & Related papers (2021-01-23T09:24:37Z) - Learning summary features of time series for likelihood free inference [93.08098361687722]
We present a data-driven strategy for automatically learning summary features from time series data.
Our results indicate that learning summary features from data can compete and even outperform LFI methods based on hand-crafted values.
arXiv Detail & Related papers (2020-12-04T19:21:37Z) - Online feature selection for rapid, low-overhead learning in networked
systems [0.0]
We present an online algorithm, called OSFS, that selects a small feature set from a large number of available data sources.
We find that OSFS requires several hundreds measurements to reduce the number of data sources by two orders of magnitude.
arXiv Detail & Related papers (2020-10-28T12:00:42Z) - An Online Learning Algorithm for a Neuro-Fuzzy Classifier with
Mixed-Attribute Data [9.061408029414455]
General fuzzy min-max neural network (GFMMNN) is one of the efficient neuro-fuzzy systems for data classification.
This paper proposes an extended online learning algorithm for the GFMMNN.
The proposed method can handle the datasets with both continuous and categorical features.
arXiv Detail & Related papers (2020-09-30T13:45:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.