Related papers: Triangle and Four Cycle Counting with Predictions in Graph Streams

Triangle and Four Cycle Counting with Predictions in Graph Streams

URL: http://arxiv.org/abs/2203.09572v1
Date: Thu, 17 Mar 2022 19:26:00 GMT
Title: Triangle and Four Cycle Counting with Predictions in Graph Streams
Authors: Justin Y. Chen, Talya Eden, Piotr Indyk, Honghao Lin, Shyam Narayanan, Ronitt Rubinfeld, Sandeep Silwal, Tal Wagner, David P. Woodruff, Michael Zhang
Abstract summary: We propose data-driven one-pass streaming algorithms for estimating the number of triangles and four cycles. We use a trained oracle that can predict certain properties of the stream elements to improve on prior "classical" algorithms. Our methodology expands upon prior work on "classical" streaming algorithms, as previous multi-pass and random order streaming algorithms can be seen as special cases.
Score: 59.05440236993604
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose data-driven one-pass streaming algorithms for estimating the number of triangles and four cycles, two fundamental problems in graph analytics that are widely studied in the graph data stream literature. Recently, (Hsu 2018) and (Jiang 2020) applied machine learning techniques in other data stream problems, using a trained oracle that can predict certain properties of the stream elements to improve on prior "classical" algorithms that did not use oracles. In this paper, we explore the power of a "heavy edge" oracle in multiple graph edge streaming models. In the adjacency list model, we present a one-pass triangle counting algorithm improving upon the previous space upper bounds without such an oracle. In the arbitrary order model, we present algorithms for both triangle and four cycle estimation with fewer passes and the same space complexity as in previous algorithms, and we show several of these bounds are optimal. We analyze our algorithms under several noise models, showing that the algorithms perform well even when the oracle errs. Our methodology expands upon prior work on "classical" streaming algorithms, as previous multi-pass and random order streaming algorithms can be seen as special cases of our algorithms, where the first pass or random order was used to implement the heavy edge oracle. Lastly, our experiments demonstrate advantages of the proposed method compared to state-of-the-art streaming algorithms.

Related papers

Fast and Accurate Triangle Counting in Graph Streams Using Predictions [4.000869978312742]
We present the first efficient and practical algorithm for estimating the number of triangles in a graph stream using predictions. Our algorithm combines waiting room sampling and reservoir sampling with a predictor for the heaviness of edges, that is, the number of triangles in which an edge is involved.
arXiv Detail & Related papers (2024-09-23T16:52:11Z)
Recommendation of data-free class-incremental learning algorithms by simulating future data [10.309079388745753]
Class-incremental learning deals with sequential data streams composed of batches of classes. We introduce an algorithm recommendation method that simulates the future data stream. We evaluate recent algorithms on the simulated stream and recommend the one which performs best in the user-defined incremental setting.
arXiv Detail & Related papers (2024-03-26T22:26:39Z)
Improved Frequency Estimation Algorithms with and without Predictions [22.382900492405938]
Estimating frequencies of elements appearing in a data stream is a key task in large-scale data analysis. We give a novel algorithm, which theoretically outperforms the learning based algorithm of Hsu et al. without the use of any predictions.
arXiv Detail & Related papers (2023-12-12T18:59:06Z)
Compressed online Sinkhorn [3.2534959204741085]
We revisit the recently introduced online Sinkhorn algorithm of [Mensch and Peyr'e, 2020]. We improve the convergence analysis for the online Sinkhorn algorithm, the new rate that we obtain is faster than the previous rate under certain parameter choices. Secondly, we propose the compressed online Sinkhorn algorithm which combines measure compression techniques with the online Sinkhorn algorithm.
arXiv Detail & Related papers (2023-10-08T05:33:32Z)
Machine Learning for Online Algorithm Selection under Censored Feedback [71.6879432974126]
In online algorithm selection (OAS), instances of an algorithmic problem class are presented to an agent one after another, and the agent has to quickly select a presumably best algorithm from a fixed set of candidate algorithms. For decision problems such as satisfiability (SAT), quality typically refers to the algorithm's runtime. In this work, we revisit multi-armed bandit algorithms for OAS and discuss their capability of dealing with the problem. We adapt them towards runtime-oriented losses, allowing for partially censored data while keeping a space- and time-complexity independent of the time horizon.
arXiv Detail & Related papers (2021-09-13T18:10:52Z)
Biclustering and Boolean Matrix Factorization in Data Streams [12.005731086591139]
We study the clustering of bipartite graphs and Boolean matrix factorization in data streams. We provide an algorithm that, after one pass over the stream, recovers the set of clusters on the right side of the graph using sublinear space. We evaluate an implementation of the algorithm on synthetic data and on real-world data.
arXiv Detail & Related papers (2020-12-05T23:02:43Z)
A Systematic Characterization of Sampling Algorithms for Open-ended Language Generation [71.31905141672529]
We study the widely adopted ancestral sampling algorithms for auto-regressive language models. We identify three key properties that are shared among them: entropy reduction, order preservation, and slope preservation. We find that the set of sampling algorithms that satisfies these properties performs on par with the existing sampling algorithms.
arXiv Detail & Related papers (2020-09-15T17:28:42Z)
New Oracle-Efficient Algorithms for Private Synthetic Data Release [52.33506193761153]
We present three new algorithms for constructing differentially private synthetic data. The algorithms satisfy differential privacy even in the worst case. Compared to the state-of-the-art method High-Dimensional Matrix Mechanism citeMcKennaMHM18, our algorithms provide better accuracy in the large workload.
arXiv Detail & Related papers (2020-07-10T15:46:05Z)
Run2Survive: A Decision-theoretic Approach to Algorithm Selection based on Survival Analysis [75.64261155172856]
survival analysis (SA) naturally supports censored data and offers appropriate ways to use such data for learning distributional models of algorithm runtime. We leverage such models as a basis of a sophisticated decision-theoretic approach to algorithm selection, which we dub Run2Survive. In an extensive experimental study with the standard benchmark ASlib, our approach is shown to be highly competitive and in many cases even superior to state-of-the-art AS approaches.
arXiv Detail & Related papers (2020-07-06T15:20:17Z)
Lagrangian Decomposition for Neural Network Verification [148.0448557991349]
A fundamental component of neural network verification is the computation of bounds on the values their outputs can take. We propose a novel approach based on Lagrangian Decomposition. We show that we obtain bounds comparable with off-the-shelf solvers in a fraction of their running time.
arXiv Detail & Related papers (2020-02-24T17:55:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.