Sampling Streaming Data with Parallel Vector Quantization -- PVQ
- URL: http://arxiv.org/abs/2210.01792v1
- Date: Tue, 4 Oct 2022 17:59:44 GMT
- Title: Sampling Streaming Data with Parallel Vector Quantization -- PVQ
- Authors: Mujahid Sultan
- Abstract summary: We present a vector quantization-based sampling method, which substantially reduces the class imbalance in data streams.
We built models using parallel processing, batch processing, and randomly selecting samples.
We show that the accuracy of classification models improves when the data streams are pre-processed with our method.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accumulation of corporate data in the cloud has attracted more enterprise
applications to the cloud creating data gravity. As a consequence, network
traffic has become more cloud centric. This increase in cloud centric traffic
poses new challenges in designing learning systems for streaming data due to
class imbalance. The number of classes plays a vital role in the accuracy of
the classifiers built from the data streams. In this paper, we present a vector
quantization-based sampling method, which substantially reduces the class
imbalance in data streams. We demonstrate its effectiveness by conducting
experiments on network traffic and anomaly dataset with commonly used ML model
building methods; Multilayered Perceptron on TensorFlow backend, Support Vector
Machines, K-Nearest Neighbour, and Random Forests. We built models using
parallel processing, batch processing, and randomly selecting samples. We show
that the accuracy of classification models improves when the data streams are
pre-processed with our method. We used out of the box hyper-parameters of these
classifiers and auto sklearn for hyperparameter optimization.
Related papers
- Test-Time Adaptation for Point Cloud Upsampling Using Meta-Learning [17.980649681325406]
We propose a test-time adaption approach to enhance model generality of point cloud upsampling.
The proposed approach leverages meta-learning to explicitly learn network parameters for test-time adaption.
Our framework is generic and can be applied in a plug-and-play manner with existing backbone networks in point cloud upsampling.
arXiv Detail & Related papers (2023-08-31T06:44:59Z) - Continual Learning with Optimal Transport based Mixture Model [17.398605698033656]
We propose an online mixture model learning approach based on nice properties of the mature optimal transport theory (OT-MM)
Our proposed method can significantly outperform the current state-of-the-art baselines.
arXiv Detail & Related papers (2022-11-30T06:40:29Z) - CloudAttention: Efficient Multi-Scale Attention Scheme For 3D Point
Cloud Learning [81.85951026033787]
We set transformers in this work and incorporate them into a hierarchical framework for shape classification and part and scene segmentation.
We also compute efficient and dynamic global cross attentions by leveraging sampling and grouping at each iteration.
The proposed hierarchical model achieves state-of-the-art shape classification in mean accuracy and yields results on par with the previous segmentation methods.
arXiv Detail & Related papers (2022-07-31T21:39:15Z) - ClusterQ: Semantic Feature Distribution Alignment for Data-Free
Quantization [111.12063632743013]
We propose a new and effective data-free quantization method termed ClusterQ.
To obtain high inter-class separability of semantic features, we cluster and align the feature distribution statistics.
We also incorporate the intra-class variance to solve class-wise mode collapse.
arXiv Detail & Related papers (2022-04-30T06:58:56Z) - How Well Do Sparse Imagenet Models Transfer? [75.98123173154605]
Transfer learning is a classic paradigm by which models pretrained on large "upstream" datasets are adapted to yield good results on "downstream" datasets.
In this work, we perform an in-depth investigation of this phenomenon in the context of convolutional neural networks (CNNs) trained on the ImageNet dataset.
We show that sparse models can match or even outperform the transfer performance of dense models, even at high sparsities.
arXiv Detail & Related papers (2021-11-26T11:58:51Z) - Point Cloud Pre-training by Mixing and Disentangling [35.18101910728478]
Mixing and Disentangling (MD) is a self-supervised learning approach for point cloud pre-training.
We show that the encoder + ours (MD) significantly surpasses that of the encoder trained from scratch and converges quickly.
We hope this self-supervised learning attempt on point clouds can pave the way for reducing the deeply-learned model dependence on large-scale labeled data.
arXiv Detail & Related papers (2021-09-01T15:52:18Z) - Automated Machine Learning Techniques for Data Streams [91.3755431537592]
This paper surveys the state-of-the-art open-source AutoML tools, applies them to data collected from streams, and measures how their performance changes over time.
The results show that off-the-shelf AutoML tools can provide satisfactory results but in the presence of concept drift, detection or adaptation techniques have to be applied to maintain the predictive accuracy over time.
arXiv Detail & Related papers (2021-06-14T11:42:46Z) - AutoFlow: Learning a Better Training Set for Optical Flow [62.40293188964933]
AutoFlow is a method to render training data for optical flow.
AutoFlow achieves state-of-the-art accuracy in pre-training both PWC-Net and RAFT.
arXiv Detail & Related papers (2021-04-29T17:55:23Z) - Lambda Learner: Fast Incremental Learning on Data Streams [5.543723668681475]
We propose a new framework for training models by incremental updates in response to mini-batches from data streams.
We show that the resulting model of our framework closely estimates a periodically updated model trained on offline data and outperforms it when model updates are time-sensitive.
We present a large-scale deployment on the sponsored content platform for a large social network.
arXiv Detail & Related papers (2020-10-11T04:00:34Z) - Pre-Trained Models for Heterogeneous Information Networks [57.78194356302626]
We propose a self-supervised pre-training and fine-tuning framework, PF-HIN, to capture the features of a heterogeneous information network.
PF-HIN consistently and significantly outperforms state-of-the-art alternatives on each of these tasks, on four datasets.
arXiv Detail & Related papers (2020-07-07T03:36:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.