Triangle and Four Cycle Counting with Predictions in Graph Streams
- URL: http://arxiv.org/abs/2203.09572v1
- Date: Thu, 17 Mar 2022 19:26:00 GMT
- Title: Triangle and Four Cycle Counting with Predictions in Graph Streams
- Authors: Justin Y. Chen, Talya Eden, Piotr Indyk, Honghao Lin, Shyam Narayanan,
Ronitt Rubinfeld, Sandeep Silwal, Tal Wagner, David P. Woodruff, Michael
Zhang
- Abstract summary: We propose data-driven one-pass streaming algorithms for estimating the number of triangles and four cycles.
We use a trained oracle that can predict certain properties of the stream elements to improve on prior "classical" algorithms.
Our methodology expands upon prior work on "classical" streaming algorithms, as previous multi-pass and random order streaming algorithms can be seen as special cases.
- Score: 59.05440236993604
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose data-driven one-pass streaming algorithms for estimating the
number of triangles and four cycles, two fundamental problems in graph
analytics that are widely studied in the graph data stream literature.
Recently, (Hsu 2018) and (Jiang 2020) applied machine learning techniques in
other data stream problems, using a trained oracle that can predict certain
properties of the stream elements to improve on prior "classical" algorithms
that did not use oracles. In this paper, we explore the power of a "heavy edge"
oracle in multiple graph edge streaming models. In the adjacency list model, we
present a one-pass triangle counting algorithm improving upon the previous
space upper bounds without such an oracle. In the arbitrary order model, we
present algorithms for both triangle and four cycle estimation with fewer
passes and the same space complexity as in previous algorithms, and we show
several of these bounds are optimal. We analyze our algorithms under several
noise models, showing that the algorithms perform well even when the oracle
errs. Our methodology expands upon prior work on "classical" streaming
algorithms, as previous multi-pass and random order streaming algorithms can be
seen as special cases of our algorithms, where the first pass or random order
was used to implement the heavy edge oracle. Lastly, our experiments
demonstrate advantages of the proposed method compared to state-of-the-art
streaming algorithms.
Related papers
- Fast and Accurate Triangle Counting in Graph Streams Using Predictions [4.000869978312742]
We present the first efficient and practical algorithm for estimating the number of triangles in a graph stream using predictions.
Our algorithm combines waiting room sampling and reservoir sampling with a predictor for the heaviness of edges, that is, the number of triangles in which an edge is involved.
arXiv Detail & Related papers (2024-09-23T16:52:11Z) - Recommendation of data-free class-incremental learning algorithms by simulating future data [10.309079388745753]
Class-incremental learning deals with sequential data streams composed of batches of classes.
We introduce an algorithm recommendation method that simulates the future data stream.
We evaluate recent algorithms on the simulated stream and recommend the one which performs best in the user-defined incremental setting.
arXiv Detail & Related papers (2024-03-26T22:26:39Z) - Improved Frequency Estimation Algorithms with and without Predictions [22.382900492405938]
Estimating frequencies of elements appearing in a data stream is a key task in large-scale data analysis.
We give a novel algorithm, which theoretically outperforms the learning based algorithm of Hsu et al. without the use of any predictions.
arXiv Detail & Related papers (2023-12-12T18:59:06Z) - Compressed online Sinkhorn [3.2534959204741085]
We revisit the recently introduced online Sinkhorn algorithm of [Mensch and Peyr'e, 2020].
We improve the convergence analysis for the online Sinkhorn algorithm, the new rate that we obtain is faster than the previous rate under certain parameter choices.
Secondly, we propose the compressed online Sinkhorn algorithm which combines measure compression techniques with the online Sinkhorn algorithm.
arXiv Detail & Related papers (2023-10-08T05:33:32Z) - Machine Learning for Online Algorithm Selection under Censored Feedback [71.6879432974126]
In online algorithm selection (OAS), instances of an algorithmic problem class are presented to an agent one after another, and the agent has to quickly select a presumably best algorithm from a fixed set of candidate algorithms.
For decision problems such as satisfiability (SAT), quality typically refers to the algorithm's runtime.
In this work, we revisit multi-armed bandit algorithms for OAS and discuss their capability of dealing with the problem.
We adapt them towards runtime-oriented losses, allowing for partially censored data while keeping a space- and time-complexity independent of the time horizon.
arXiv Detail & Related papers (2021-09-13T18:10:52Z) - Biclustering and Boolean Matrix Factorization in Data Streams [12.005731086591139]
We study the clustering of bipartite graphs and Boolean matrix factorization in data streams.
We provide an algorithm that, after one pass over the stream, recovers the set of clusters on the right side of the graph using sublinear space.
We evaluate an implementation of the algorithm on synthetic data and on real-world data.
arXiv Detail & Related papers (2020-12-05T23:02:43Z) - A Systematic Characterization of Sampling Algorithms for Open-ended
Language Generation [71.31905141672529]
We study the widely adopted ancestral sampling algorithms for auto-regressive language models.
We identify three key properties that are shared among them: entropy reduction, order preservation, and slope preservation.
We find that the set of sampling algorithms that satisfies these properties performs on par with the existing sampling algorithms.
arXiv Detail & Related papers (2020-09-15T17:28:42Z) - New Oracle-Efficient Algorithms for Private Synthetic Data Release [52.33506193761153]
We present three new algorithms for constructing differentially private synthetic data.
The algorithms satisfy differential privacy even in the worst case.
Compared to the state-of-the-art method High-Dimensional Matrix Mechanism citeMcKennaMHM18, our algorithms provide better accuracy in the large workload.
arXiv Detail & Related papers (2020-07-10T15:46:05Z) - Run2Survive: A Decision-theoretic Approach to Algorithm Selection based
on Survival Analysis [75.64261155172856]
survival analysis (SA) naturally supports censored data and offers appropriate ways to use such data for learning distributional models of algorithm runtime.
We leverage such models as a basis of a sophisticated decision-theoretic approach to algorithm selection, which we dub Run2Survive.
In an extensive experimental study with the standard benchmark ASlib, our approach is shown to be highly competitive and in many cases even superior to state-of-the-art AS approaches.
arXiv Detail & Related papers (2020-07-06T15:20:17Z) - Lagrangian Decomposition for Neural Network Verification [148.0448557991349]
A fundamental component of neural network verification is the computation of bounds on the values their outputs can take.
We propose a novel approach based on Lagrangian Decomposition.
We show that we obtain bounds comparable with off-the-shelf solvers in a fraction of their running time.
arXiv Detail & Related papers (2020-02-24T17:55:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.