Related papers: Online feature selection for rapid, low-overhead learning in networked systems

Online feature selection for rapid, low-overhead learning in networked systems

URL: http://arxiv.org/abs/2010.14907v1
Date: Wed, 28 Oct 2020 12:00:42 GMT
Title: Online feature selection for rapid, low-overhead learning in networked systems
Authors: Xiaoxuan Wang (1), Forough Shahab Samani (1 and 2), Rolf Stadler (1 and 2) ((1) KTH Royal Institute of Technology, Sweden (2) RISE Research Institutes of Sweden)
Abstract summary: We present an online algorithm, called OSFS, that selects a small feature set from a large number of available data sources. We find that OSFS requires several hundreds measurements to reduce the number of data sources by two orders of magnitude.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Data-driven functions for operation and management often require measurements collected through monitoring for model training and prediction. The number of data sources can be very large, which requires a significant communication and computing overhead to continuously extract and collect this data, as well as to train and update the machine-learning models. We present an online algorithm, called OSFS, that selects a small feature set from a large number of available data sources, which allows for rapid, low-overhead, and effective learning and prediction. OSFS is instantiated with a feature ranking algorithm and applies the concept of a stable feature set, which we introduce in the paper. We perform extensive, experimental evaluation of our method on data from an in-house testbed. We find that OSFS requires several hundreds measurements to reduce the number of data sources by two orders of magnitude, from which models are trained with acceptable prediction accuracy. While our method is heuristic and can be improved in many ways, the results clearly suggests that many learning tasks do not require a lengthy monitoring phase and expensive offline training.

Related papers

SPaRFT: Self-Paced Reinforcement Fine-Tuning for Large Language Models [51.74498855100541]
Large language models (LLMs) have shown strong reasoning capabilities when fine-tuned with reinforcement learning (RL)<n>We propose textbfSPaRFT, a self-paced learning framework that enables efficient learning based on the capability of the model being trained.
arXiv Detail & Related papers (2025-08-07T03:50:48Z)
Learning What Matters: Prioritized Concept Learning via Relative Error-driven Sample Selection [38.35524024887503]
We propose PRioritized cOncept learninG via Relative Error-driven Sample Selection (PROGRESS)<n>PROGRESS is a data- and compute-efficient framework that enables vision-language models to dynamically select what to learn next.<n>We show that PROGRESS consistently outperforms state-of-the-art baselines with much less data and supervision.
arXiv Detail & Related papers (2025-06-01T17:05:35Z)
Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding [9.112203072394648]
Power-law scaling indicates that large-scale training with uniform sampling is prohibitively slow. Active learning methods aim to increase data efficiency by prioritizing learning on the most relevant examples.
arXiv Detail & Related papers (2023-12-08T19:26:13Z)
Zero-shot Retrieval: Augmenting Pre-trained Models with Search Engines [83.65380507372483]
Large pre-trained models can dramatically reduce the amount of task-specific data required to solve a problem, but they often fail to capture domain-specific nuances out of the box. This paper shows how to leverage recent advances in NLP and multi-modal learning to augment a pre-trained model with search engine retrieval.
arXiv Detail & Related papers (2023-11-29T05:33:28Z)
A Novel Neural Network-Based Federated Learning System for Imbalanced and Non-IID Data [2.9642661320713555]
Most machine learning algorithms rely heavily on large amount of data which may be collected from various sources. To combat this issue, researchers have introduced federated learning, where a prediction model is learnt by ensuring the privacy of data of clients data. In this research, we propose a centralized, neural network-based federated learning system.
arXiv Detail & Related papers (2023-11-16T17:14:07Z)
Optimal transfer protocol by incremental layer defrosting [66.76153955485584]
Transfer learning is a powerful tool enabling model training with limited amounts of data. The simplest transfer learning protocol is based on freezing" the feature-extractor layers of a network pre-trained on a data-rich source task. We show that this protocol is often sub-optimal and the largest performance gain may be achieved when smaller portions of the pre-trained network are kept frozen.
arXiv Detail & Related papers (2023-03-02T17:32:11Z)
Online Evolutionary Neural Architecture Search for Multivariate Non-Stationary Time Series Forecasting [72.89994745876086]
This work presents the Online Neuro-Evolution-based Neural Architecture Search (ONE-NAS) algorithm. ONE-NAS is a novel neural architecture search method capable of automatically designing and dynamically training recurrent neural networks (RNNs) for online forecasting tasks. Results demonstrate that ONE-NAS outperforms traditional statistical time series forecasting methods.
arXiv Detail & Related papers (2023-02-20T22:25:47Z)
Evaluating and Crafting Datasets Effective for Deep Learning With Data Maps [0.0]
Training on large datasets often requires excessive system resources and an infeasible amount of time. For supervised learning, large datasets require more time for manually labeling samples. We propose a method of curating smaller datasets with comparable out-of-distribution model accuracy after an initial training session.
arXiv Detail & Related papers (2022-08-22T03:30:18Z)
Beyond Just Vision: A Review on Self-Supervised Representation Learning on Multimodal and Temporal Data [10.006890915441987]
Popularity of self-supervised learning is driven by the fact that traditional models typically require a huge amount of well-annotated data for training. Self-supervised methods have been introduced to improve the efficiency of training data through discriminative pre-training of models. We aim to provide the first comprehensive review of multimodal self-supervised learning methods for temporal data.
arXiv Detail & Related papers (2022-06-06T04:59:44Z)
Knowledge Distillation as Efficient Pre-training: Faster Convergence, Higher Data-efficiency, and Better Transferability [53.27240222619834]
Knowledge Distillation as Efficient Pre-training aims to efficiently transfer the learned feature representation from pre-trained models to new student models for future downstream tasks. Our method performs comparably with supervised pre-training counterparts in 3 downstream tasks and 9 downstream datasets requiring 10x less data and 5x less pre-training time.
arXiv Detail & Related papers (2022-03-10T06:23:41Z)
Deep Reinforcement Learning Assisted Federated Learning Algorithm for Data Management of IIoT [82.33080550378068]
The continuous expanded scale of the industrial Internet of Things (IIoT) leads to IIoT equipments generating massive amounts of user data every moment. How to manage these time series data in an efficient and safe way in the field of IIoT is still an open issue. This paper studies the FL technology applications to manage IIoT equipment data in wireless network environments.
arXiv Detail & Related papers (2022-02-03T07:12:36Z)
Online Feature Selection for Efficient Learning in Networked Systems [3.13468877208035]
Current AI/ML methods for data-driven engineering use models that are mostly trained offline. We present an online algorithm called Online Stable Feature Set Algorithm (OSFS), which selects a small feature set from a large number of available data sources. OSFS achieves a massive reduction in the size of the feature set by 1-3 orders of magnitude on all investigated datasets.
arXiv Detail & Related papers (2021-12-15T16:31:59Z)
Online Continual Learning with Natural Distribution Shifts: An Empirical Study with Visual Data [101.6195176510611]
"Online" continual learning enables evaluating both information retention and online learning efficacy. In online continual learning, each incoming small batch of data is first used for testing and then added to the training set, making the problem truly online. We introduce a new benchmark for online continual visual learning that exhibits large scale and natural distribution shifts.
arXiv Detail & Related papers (2021-08-20T06:17:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.