Related papers: Online Feature Selection for Efficient Learning in Networked Systems

Online Feature Selection for Efficient Learning in Networked Systems

URL: http://arxiv.org/abs/2112.08253v1
Date: Wed, 15 Dec 2021 16:31:59 GMT
Title: Online Feature Selection for Efficient Learning in Networked Systems
Authors: Xiaoxuan Wang, Rolf Stadler
Abstract summary: Current AI/ML methods for data-driven engineering use models that are mostly trained offline. We present an online algorithm called Online Stable Feature Set Algorithm (OSFS), which selects a small feature set from a large number of available data sources. OSFS achieves a massive reduction in the size of the feature set by 1-3 orders of magnitude on all investigated datasets.
Score: 3.13468877208035
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Current AI/ML methods for data-driven engineering use models that are mostly trained offline. Such models can be expensive to build in terms of communication and computing cost, and they rely on data that is collected over extended periods of time. Further, they become out-of-date when changes in the system occur. To address these challenges, we investigate online learning techniques that automatically reduce the number of available data sources for model training. We present an online algorithm called Online Stable Feature Set Algorithm (OSFS), which selects a small feature set from a large number of available data sources after receiving a small number of measurements. The algorithm is initialized with a feature ranking algorithm, a feature set stability metric, and a search policy. We perform an extensive experimental evaluation of this algorithm using traces from an in-house testbed and from a data center in operation. We find that OSFS achieves a massive reduction in the size of the feature set by 1-3 orders of magnitude on all investigated datasets. Most importantly, we find that the accuracy of a predictor trained on a OSFS-produced feature set is somewhat better than when the predictor is trained on a feature set obtained through offline feature selection. OSFS is thus shown to be effective as an online feature selection algorithm and robust regarding the sample interval used for feature selection. We also find that, when concept drift in the data underlying the model occurs, its effect can be mitigated by recomputing the feature set and retraining the prediction model.

Related papers

Algorithms for estimating linear function in data mining [0.0]
The main goal of this topic is to showcase several studied algorithms for estimating the linear utility function to predict the users preferences.<n>For example, if a user comes to buy a car that has several attributes including speed, color, age, etc in a linear function, the algorithms that we present in this paper help with estimating this linear function to filter out a small subset.
arXiv Detail & Related papers (2025-06-04T03:09:07Z)
Capturing the Temporal Dependence of Training Data Influence [100.91355498124527]
We formalize the concept of trajectory-specific leave-one-out influence, which quantifies the impact of removing a data point during training. We propose data value embedding, a novel technique enabling efficient approximation of trajectory-specific LOO. As data value embedding captures training data ordering, it offers valuable insights into model training dynamics.
arXiv Detail & Related papers (2024-12-12T18:28:55Z)
Attribute-to-Delete: Machine Unlearning via Datamodel Matching [65.13151619119782]
Machine unlearning -- efficiently removing a small "forget set" training data on a pre-divertrained machine learning model -- has recently attracted interest. Recent research shows that machine unlearning techniques do not hold up in such a challenging setting.
arXiv Detail & Related papers (2024-10-30T17:20:10Z)
Unveiling the Power of Sparse Neural Networks for Feature Selection [60.50319755984697]
Sparse Neural Networks (SNNs) have emerged as powerful tools for efficient feature selection. We show that SNNs trained with dynamic sparse training (DST) algorithms can achieve, on average, more than $50%$ memory and $55%$ FLOPs reduction. Our findings show that feature selection with SNNs trained with DST algorithms can achieve, on average, more than $50%$ memory and $55%$ FLOPs reduction.
arXiv Detail & Related papers (2024-08-08T16:48:33Z)
LESS: Selecting Influential Data for Targeted Instruction Tuning [64.78894228923619]
We propose LESS, an efficient algorithm to estimate data influences and perform Low-rank gradiEnt Similarity Search for instruction data selection. We show that training on a LESS-selected 5% of the data can often outperform training on the full dataset across diverse downstream tasks. Our method goes beyond surface form cues to identify data that the necessary reasoning skills for the intended downstream application.
arXiv Detail & Related papers (2024-02-06T19:18:04Z)
Towards Free Data Selection with General-Purpose Models [71.92151210413374]
A desirable data selection algorithm can efficiently choose the most informative samples to maximize the utility of limited annotation budgets. Current approaches, represented by active learning methods, typically follow a cumbersome pipeline that iterates the time-consuming model training and batch data selection repeatedly. FreeSel bypasses the heavy batch selection process, achieving a significant improvement in efficiency and being 530x faster than existing active learning methods.
arXiv Detail & Related papers (2023-09-29T15:50:14Z)
Compactness Score: A Fast Filter Method for Unsupervised Feature Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features. Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z)
Federated Feature Selection for Cyber-Physical Systems of Systems [0.3609538870261841]
A fleet of autonomous vehicles finds a consensus on the optimal set of features that they exploit to reduce data transmission up to 99% with negligible information loss. Our results show that a fleet of autonomous vehicles finds a consensus on the optimal set of features that they exploit to reduce data transmission up to 99% with negligible information loss.
arXiv Detail & Related papers (2021-09-23T12:16:50Z)
Online Feature Screening for Data Streams with Concept Drift [8.807587076209566]
This research study focuses on classification datasets. Our experiments show proposed methods can generate the same feature importance as their offline version with faster speed and less storage consumption. The results show that online screening methods with integrated model adaptation have a higher true feature detection rate than without model adaptation on data streams with the concept drift property.
arXiv Detail & Related papers (2021-04-07T03:16:15Z)
Feature Selection Using Reinforcement Learning [0.0]
The space of variables or features that can be used to characterize a particular predictor of interest continues to grow exponentially. Identifying the most characterizing features that minimizes the variance without jeopardizing the bias of our models is critical to successfully training a machine learning model.
arXiv Detail & Related papers (2021-01-23T09:24:37Z)
Learning summary features of time series for likelihood free inference [93.08098361687722]
We present a data-driven strategy for automatically learning summary features from time series data. Our results indicate that learning summary features from data can compete and even outperform LFI methods based on hand-crafted values.
arXiv Detail & Related papers (2020-12-04T19:21:37Z)
Online feature selection for rapid, low-overhead learning in networked systems [0.0]
We present an online algorithm, called OSFS, that selects a small feature set from a large number of available data sources. We find that OSFS requires several hundreds measurements to reduce the number of data sources by two orders of magnitude.
arXiv Detail & Related papers (2020-10-28T12:00:42Z)
An Online Learning Algorithm for a Neuro-Fuzzy Classifier with Mixed-Attribute Data [9.061408029414455]
General fuzzy min-max neural network (GFMMNN) is one of the efficient neuro-fuzzy systems for data classification. This paper proposes an extended online learning algorithm for the GFMMNN. The proposed method can handle the datasets with both continuous and categorical features.
arXiv Detail & Related papers (2020-09-30T13:45:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.