Fast and Accurate Triangle Counting in Graph Streams Using Predictions
- URL: http://arxiv.org/abs/2409.15205v1
- Date: Mon, 23 Sep 2024 16:52:11 GMT
- Title: Fast and Accurate Triangle Counting in Graph Streams Using Predictions
- Authors: Cristian Boldrin, Fabio Vandin,
- Abstract summary: We present the first efficient and practical algorithm for estimating the number of triangles in a graph stream using predictions.
Our algorithm combines waiting room sampling and reservoir sampling with a predictor for the heaviness of edges, that is, the number of triangles in which an edge is involved.
- Score: 4.000869978312742
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we present the first efficient and practical algorithm for estimating the number of triangles in a graph stream using predictions. Our algorithm combines waiting room sampling and reservoir sampling with a predictor for the heaviness of edges, that is, the number of triangles in which an edge is involved. As a result, our algorithm is fast, provides guarantees on the amount of memory used, and exploits the additional information provided by the predictor to produce highly accurate estimates. We also propose a simple and domain-independent predictor, based on the degree of nodes, that can be easily computed with one pass on a stream of edges when the stream is available beforehand. Our analytical results show that, when the predictor provides useful information on the heaviness of edges, it leads to estimates with reduced variance compared to the state-of-the-art, even when the predictions are far from perfect. Our experimental results show that, when analyzing a single graph stream, our algorithm is faster than the state-of-the-art for a given memory budget, while providing significantly more accurate estimates. Even more interestingly, when sequences of hundreds of graph streams are analyzed, our algorithm significantly outperforms the state-of-the-art using our simple degree-based predictor built by analyzing only the first graph of the sequence.
Related papers
- Less is More: One-shot Subgraph Reasoning on Large-scale Knowledge Graphs [49.547988001231424]
We propose the one-shot-subgraph link prediction to achieve efficient and adaptive prediction.
Design principle is that, instead of directly acting on the whole KG, the prediction procedure is decoupled into two steps.
We achieve promoted efficiency and leading performances on five large-scale benchmarks.
arXiv Detail & Related papers (2024-03-15T12:00:12Z) - Improved Frequency Estimation Algorithms with and without Predictions [22.382900492405938]
Estimating frequencies of elements appearing in a data stream is a key task in large-scale data analysis.
We give a novel algorithm, which theoretically outperforms the learning based algorithm of Hsu et al. without the use of any predictions.
arXiv Detail & Related papers (2023-12-12T18:59:06Z) - Reinforcement Learning Enhanced Weighted Sampling for Accurate Subgraph
Counting on Fully Dynamic Graph Streams [35.943447765433774]
We propose a weighted sampling algorithm called WSD for estimating the subgraph count in a fully dynamic graph stream.
We determine the weights of edges in a data-driven fashion, using a novel method based on reinforcement learning.
arXiv Detail & Related papers (2022-11-13T03:01:34Z) - Sampling Enclosing Subgraphs for Link Prediction [2.1270496914042996]
Link prediction is a fundamental problem for graph-structured data computation.
This paper presents a scalable solution, that we call ScaLed, which utilizes sparse enclosing subgraphs to make predictions.
By leveraging the smaller sampled subgraph, ScaLed can scale to larger graphs with much less overhead while maintaining high accuracy.
arXiv Detail & Related papers (2022-06-23T22:48:44Z) - Triangle and Four Cycle Counting with Predictions in Graph Streams [59.05440236993604]
We propose data-driven one-pass streaming algorithms for estimating the number of triangles and four cycles.
We use a trained oracle that can predict certain properties of the stream elements to improve on prior "classical" algorithms.
Our methodology expands upon prior work on "classical" streaming algorithms, as previous multi-pass and random order streaming algorithms can be seen as special cases.
arXiv Detail & Related papers (2022-03-17T19:26:00Z) - Robustification of Online Graph Exploration Methods [59.50307752165016]
We study a learning-augmented variant of the classical, notoriously hard online graph exploration problem.
We propose an algorithm that naturally integrates predictions into the well-known Nearest Neighbor (NN) algorithm.
arXiv Detail & Related papers (2021-12-10T10:02:31Z) - Computing Graph Descriptors on Edge Streams [4.129847064263056]
We present streaming algorithms to approximately compute three different graph descriptors.
operating on edge streams allows us to avoid storing the entire graph in memory.
Our scalable algorithms compute descriptors of graphs with millions of edges within minutes.
arXiv Detail & Related papers (2021-09-02T06:21:47Z) - Sketch-Based Anomaly Detection in Streaming Graphs [89.52200264469364]
Given a stream of graph edges from a dynamic graph, how can we assign anomaly scores to edges and subgraphs in an online manner?
Our method is the first streaming approach that incorporates dense subgraph search to detect graph anomalies in constant memory and time.
arXiv Detail & Related papers (2021-06-08T16:10:36Z) - Estimating leverage scores via rank revealing methods and randomization [50.591267188664666]
We study algorithms for estimating the statistical leverage scores of rectangular dense or sparse matrices of arbitrary rank.
Our approach is based on combining rank revealing methods with compositions of dense and sparse randomized dimensionality reduction transforms.
arXiv Detail & Related papers (2021-05-23T19:21:55Z) - Time series forecasting based on complex network in weighted node
similarity [12.246860992135783]
In time series analysis, visibility graph theory transforms time series data into a network model.
The node similarity index is used as the weight coefficient to optimize the prediction algorithm.
The method has more accurate forecasting ability and can provide more accurate forecasts in the field of time series and actual scenes.
arXiv Detail & Related papers (2021-03-14T01:01:41Z) - Online Dense Subgraph Discovery via Blurred-Graph Feedback [87.9850024070244]
We introduce a novel learning problem for dense subgraph discovery.
We first propose a edge-time algorithm that obtains a nearly-optimal solution with high probability.
We then design a more scalable algorithm with a theoretical guarantee.
arXiv Detail & Related papers (2020-06-24T11:37:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.