Related papers: Improved Frequency Estimation Algorithms with and without Predictions

Improved Frequency Estimation Algorithms with and without Predictions

URL: http://arxiv.org/abs/2312.07535v1
Date: Tue, 12 Dec 2023 18:59:06 GMT
Title: Improved Frequency Estimation Algorithms with and without Predictions
Authors: Anders Aamand, Justin Y. Chen, Huy L\^e Nguyen, Sandeep Silwal, Ali Vakilian
Abstract summary: Estimating frequencies of elements appearing in a data stream is a key task in large-scale data analysis. We give a novel algorithm, which theoretically outperforms the learning based algorithm of Hsu et al. without the use of any predictions.
Score: 22.382900492405938
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Estimating frequencies of elements appearing in a data stream is a key task in large-scale data analysis. Popular sketching approaches to this problem (e.g., CountMin and CountSketch) come with worst-case guarantees that probabilistically bound the error of the estimated frequencies for any possible input. The work of Hsu et al. (2019) introduced the idea of using machine learning to tailor sketching algorithms to the specific data distribution they are being run on. In particular, their learning-augmented frequency estimation algorithm uses a learned heavy-hitter oracle which predicts which elements will appear many times in the stream. We give a novel algorithm, which in some parameter regimes, already theoretically outperforms the learning based algorithm of Hsu et al. without the use of any predictions. Augmenting our algorithm with heavy-hitter predictions further reduces the error and improves upon the state of the art. Empirically, our algorithms achieve superior performance in all experiments compared to prior approaches.

Related papers

Learning-Augmented Frequent Directions [13.827632579682795]
A fundamental problem in the streaming literature is to approximate the number of occurrences of items appearing in a long stream of data using only a small amount of memory. We develop a framework to combine the worst-case guarantees of popular solutions such as CountMin and CountSketch with learned predictions of high frequency elements. We analyze a learning-augmented variant of the Frequent Directions algorithm, extending the theoretical and empirical understanding of learned predictions to matrix streaming.
arXiv Detail & Related papers (2025-03-02T15:20:56Z)
Learning-Based Heavy Hitters and Flow Frequency Estimation in Streams [9.22255012731159]
We present the first learned competing-counter-based algorithm, called LSS, for identifying heavy hitters, top k, and flow frequency estimation. Our evaluation demonstrates that LSS can enhance the accuracy and efficiency of Space Saving in identifying heavy hitters, top k, and estimating flow frequencies.
arXiv Detail & Related papers (2024-06-24T02:31:00Z)
Learning the Positions in CountSketch [49.57951567374372]
We consider sketching algorithms which first compress data by multiplication with a random sketch matrix, and then apply the sketch to quickly solve an optimization problem. In this work, we propose the first learning-based algorithms that also optimize the locations of the non-zero entries.
arXiv Detail & Related papers (2023-06-11T07:28:35Z)
Algorithms with Prediction Portfolios [23.703372221079306]
We study the use of multiple predictors for a number of fundamental problems, including matching, load balancing, and non-clairvoyant scheduling. For each of these problems we introduce new algorithms that take advantage of multiple predictors, and prove bounds on the resulting performance.
arXiv Detail & Related papers (2022-10-22T12:58:07Z)
Generalization Bounds for Data-Driven Numerical Linear Algebra [24.961270871124217]
Data-driven algorithms can adapt their internal structure or parameters to inputs from unknown application-specific distributions, by learning from a training sample of inputs. Several recent works have applied this approach to problems in numerical linear algebra, obtaining significant empirical gains in performance. In this work we prove generalization bounds for those algorithms, within the PAC-learning framework for data-driven algorithm selection proposed by Gupta and Roughgarden.
arXiv Detail & Related papers (2022-06-16T02:23:45Z)
Triangle and Four Cycle Counting with Predictions in Graph Streams [59.05440236993604]
We propose data-driven one-pass streaming algorithms for estimating the number of triangles and four cycles. We use a trained oracle that can predict certain properties of the stream elements to improve on prior "classical" algorithms. Our methodology expands upon prior work on "classical" streaming algorithms, as previous multi-pass and random order streaming algorithms can be seen as special cases.
arXiv Detail & Related papers (2022-03-17T19:26:00Z)
Non-Clairvoyant Scheduling with Predictions Revisited [77.86290991564829]
In non-clairvoyant scheduling, the task is to find an online strategy for scheduling jobs with a priori unknown processing requirements. We revisit this well-studied problem in a recently popular learning-augmented setting that integrates (untrusted) predictions in algorithm design. We show that these predictions have desired properties, admit a natural error measure as well as algorithms with strong performance guarantees.
arXiv Detail & Related papers (2022-02-21T13:18:11Z)
Robustification of Online Graph Exploration Methods [59.50307752165016]
We study a learning-augmented variant of the classical, notoriously hard online graph exploration problem. We propose an algorithm that naturally integrates predictions into the well-known Nearest Neighbor (NN) algorithm.
arXiv Detail & Related papers (2021-12-10T10:02:31Z)
Machine Learning for Online Algorithm Selection under Censored Feedback [71.6879432974126]
In online algorithm selection (OAS), instances of an algorithmic problem class are presented to an agent one after another, and the agent has to quickly select a presumably best algorithm from a fixed set of candidate algorithms. For decision problems such as satisfiability (SAT), quality typically refers to the algorithm's runtime. In this work, we revisit multi-armed bandit algorithms for OAS and discuss their capability of dealing with the problem. We adapt them towards runtime-oriented losses, allowing for partially censored data while keeping a space- and time-complexity independent of the time horizon.
arXiv Detail & Related papers (2021-09-13T18:10:52Z)
Double Coverage with Machine-Learned Advice [100.23487145400833]
We study the fundamental online $k$-server problem in a learning-augmented setting. We show that our algorithm achieves for any k an almost optimal consistency-robustness tradeoff.
arXiv Detail & Related papers (2021-03-02T11:04:33Z)
Frequency Estimation in Data Streams: Learning the Optimal Hashing Scheme [3.7565501074323224]
We present a novel approach for the problem of frequency estimation in data streams that is based on optimization and machine learning. The proposed approach exploits an observed stream prefix to near-optimally hash elements and compress the target frequency distribution. We show that the proposed approach outperforms existing approaches by one to two orders of magnitude in terms of its average (per element) estimation error and by 45-90% in terms of its expected magnitude of estimation error.
arXiv Detail & Related papers (2020-07-17T22:15:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.