Related papers: Learning-Based Heavy Hitters and Flow Frequency Estimation in Streams

Learning-Based Heavy Hitters and Flow Frequency Estimation in Streams

URL: http://arxiv.org/abs/2406.16270v1
Date: Mon, 24 Jun 2024 02:31:00 GMT
Title: Learning-Based Heavy Hitters and Flow Frequency Estimation in Streams
Authors: Rana Shahout, Michael Mitzenmacher,
Abstract summary: We present the first learned competing-counter-based algorithm, called LSS, for identifying heavy hitters, top k, and flow frequency estimation. Our evaluation demonstrates that LSS can enhance the accuracy and efficiency of Space Saving in identifying heavy hitters, top k, and estimating flow frequencies.
Score: 9.22255012731159
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Identifying heavy hitters and estimating the frequencies of flows are fundamental tasks in various network domains. Existing approaches to this challenge can broadly be categorized into two groups, hashing-based and competing-counter-based. The Count-Min sketch is a standard example of a hashing-based algorithm, and the Space Saving algorithm is an example of a competing-counter algorithm. Recent works have explored the use of machine learning to enhance algorithms for frequency estimation problems, under the algorithms with prediction framework. However, these works have focused solely on the hashing-based approach, which may not be best for identifying heavy hitters. In this paper, we present the first learned competing-counter-based algorithm, called LSS, for identifying heavy hitters, top k, and flow frequency estimation that utilizes the well-known Space Saving algorithm. We provide theoretical insights into how and to what extent our approach can improve upon Space Saving, backed by experimental results on both synthetic and real-world datasets. Our evaluation demonstrates that LSS can enhance the accuracy and efficiency of Space Saving in identifying heavy hitters, top k, and estimating flow frequencies.

Related papers

Learning-Augmented Frequent Directions [13.827632579682795]
A fundamental problem in the streaming literature is to approximate the number of occurrences of items appearing in a long stream of data using only a small amount of memory. We develop a framework to combine the worst-case guarantees of popular solutions such as CountMin and CountSketch with learned predictions of high frequency elements. We analyze a learning-augmented variant of the Frequent Directions algorithm, extending the theoretical and empirical understanding of learned predictions to matrix streaming.
arXiv Detail & Related papers (2025-03-02T15:20:56Z)
Score-matching-based Structure Learning for Temporal Data on Networks [17.166362605356074]
Causal discovery is a crucial initial step in establishing causality from empirical data and background knowledge. Current score-matching-based algorithms are primarily designed to analyze independent and identically distributed (i.i.d.) data. We have developed a new parent-finding subroutine for leaf nodes in DAGs, significantly accelerating the most time-consuming part of the process: the pruning step.
arXiv Detail & Related papers (2024-12-10T12:36:35Z)
Learning-based Sketches for Frequency Estimation in Data Streams without Ground Truth [8.643366221221351]
We propose a more practical learning-based estimation framework namely UCL-sketch. Online training via equivalent learning without ground truth, and highly scalable architecture with logical estimation buckets. Results demonstrate that our method greatly outperforms existing state-of-the-art sketches regarding per-key accuracy and distribution.
arXiv Detail & Related papers (2024-12-04T14:00:50Z)
Improved Frequency Estimation Algorithms with and without Predictions [22.382900492405938]
Estimating frequencies of elements appearing in a data stream is a key task in large-scale data analysis. We give a novel algorithm, which theoretically outperforms the learning based algorithm of Hsu et al. without the use of any predictions.
arXiv Detail & Related papers (2023-12-12T18:59:06Z)
The Cascaded Forward Algorithm for Neural Network Training [61.06444586991505]
We propose a new learning framework for neural networks, namely Cascaded Forward (CaFo) algorithm, which does not rely on BP optimization as that in FF. Unlike FF, our framework directly outputs label distributions at each cascaded block, which does not require generation of additional negative samples. In our framework each block can be trained independently, so it can be easily deployed into parallel acceleration systems.
arXiv Detail & Related papers (2023-03-17T02:01:11Z)
On the Effective Usage of Priors in RSS-based Localization [56.68864078417909]
We propose a Received Signal Strength (RSS) fingerprint and convolutional neural network-based algorithm, LocUNet. In this paper, we study the localization problem in dense urban settings. We first recognize LocUNet's ability to learn the underlying prior distribution of the Rx position or Rx and transmitter (Tx) association preferences from the training data, and attribute its high performance to these.
arXiv Detail & Related papers (2022-11-28T00:31:02Z)
Large-Scale Sequential Learning for Recommender and Engineering Systems [91.3755431537592]
In this thesis, we focus on the design of an automatic algorithms that provide personalized ranking by adapting to the current conditions. For the former, we propose novel algorithm called SAROS that take into account both kinds of feedback for learning over the sequence of interactions. The proposed idea of taking into account the neighbour lines shows statistically significant results in comparison with the initial approach for faults detection in power grid.
arXiv Detail & Related papers (2022-05-13T21:09:41Z)
Early Time-Series Classification Algorithms: An Empirical Comparison [59.82930053437851]
Early Time-Series Classification (ETSC) is the task of predicting the class of incoming time-series by observing as few measurements as possible. We evaluate six existing ETSC algorithms on publicly available data, as well as on two newly introduced datasets.
arXiv Detail & Related papers (2022-03-03T10:43:56Z)
Fast Density Estimation for Density-based Clustering Methods [3.8972699157287702]
Density-based clustering algorithms are widely used for discovering clusters in pattern recognition and machine learning. The robustness of density-based algorithms is heavily dominated by finding neighbors and calculating the density of each point which is time-consuming. This paper proposes a density-based clustering framework by using the fast principal component analysis, which can be applied to density based methods to prune unnecessary distance calculations.
arXiv Detail & Related papers (2021-09-23T13:59:42Z)
A quantum algorithm for gravitational wave matched filtering [0.0]
We propose the application of a quantum algorithm for the detection of unknown signals in noisy data. In comparison to the classical method, this provides a speed-up proportional to the square-root of the number of templates. We demonstrate both a proof-of-principle quantum circuit implementation, and a simulation of the algorithm's application to the detection of the first gravitational wave signal GW150914.
arXiv Detail & Related papers (2021-09-03T13:58:58Z)
A Sparse Structure Learning Algorithm for Bayesian Network Identification from Discrete High-Dimensional Data [0.40611352512781856]
This paper addresses the problem of learning a sparse structure Bayesian network from high-dimensional discrete data. We propose a score function that satisfies the sparsity and the DAG property simultaneously. Specifically, we use a variance reducing method in our optimization algorithm to make the algorithm work efficiently in high-dimensional data.
arXiv Detail & Related papers (2021-08-21T12:21:01Z)
Learning to Hash Robustly, with Guarantees [79.68057056103014]
In this paper, we design an NNS algorithm for the Hamming space that has worst-case guarantees essentially matching that of theoretical algorithms. We evaluate the algorithm's ability to optimize for a given dataset both theoretically and practically. Our algorithm has a 1.8x and 2.1x better recall on the worst-performing queries to the MNIST and ImageNet datasets.
arXiv Detail & Related papers (2021-08-11T20:21:30Z)
Bayesian Optimization with Machine Learning Algorithms Towards Anomaly Detection [66.05992706105224]
In this paper, an effective anomaly detection framework is proposed utilizing Bayesian Optimization technique. The performance of the considered algorithms is evaluated using the ISCX 2012 dataset. Experimental results show the effectiveness of the proposed framework in term of accuracy rate, precision, low-false alarm rate, and recall.
arXiv Detail & Related papers (2020-08-05T19:29:35Z)
Frequency Estimation in Data Streams: Learning the Optimal Hashing Scheme [3.7565501074323224]
We present a novel approach for the problem of frequency estimation in data streams that is based on optimization and machine learning. The proposed approach exploits an observed stream prefix to near-optimally hash elements and compress the target frequency distribution. We show that the proposed approach outperforms existing approaches by one to two orders of magnitude in terms of its average (per element) estimation error and by 45-90% in terms of its expected magnitude of estimation error.
arXiv Detail & Related papers (2020-07-17T22:15:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.