Covering a Graph with Dense Subgraph Families, via Triangle-Rich Sets
- URL: http://arxiv.org/abs/2407.16850v1
- Date: Tue, 23 Jul 2024 21:26:37 GMT
- Title: Covering a Graph with Dense Subgraph Families, via Triangle-Rich Sets
- Authors: Sabyasachi Basu, Daniel Paul-Pena, Kun Qian, C. Seshadhri, Edward W Huang, Karthik Subbian,
- Abstract summary: Real-world graphs are globally sparse, yet contain numerous "pockets" of high edge density.
A fundamental task in graph mining is to discover these dense subgraphs.
We give a mathematical formulation of this problem, using a new definition of regularly triangle-rich (RTR) families.
We design a provable algorithm, RTRExtractor, that can discover RTR families that approximately cover any RTR set.
- Score: 15.078359519557207
- License:
- Abstract: Graphs are a fundamental data structure used to represent relationships in domains as diverse as the social sciences, bioinformatics, cybersecurity, the Internet, and more. One of the central observations in network science is that real-world graphs are globally sparse, yet contains numerous "pockets" of high edge density. A fundamental task in graph mining is to discover these dense subgraphs. Most common formulations of the problem involve finding a single (or a few) "optimally" dense subsets. But in most real applications, one does not care for the optimality. Instead, we want to find a large collection of dense subsets that covers a significant fraction of the input graph. We give a mathematical formulation of this problem, using a new definition of regularly triangle-rich (RTR) families. These families capture the notion of dense subgraphs that contain many triangles and have degrees comparable to the subgraph size. We design a provable algorithm, RTRExtractor, that can discover RTR families that approximately cover any RTR set. The algorithm is efficient and is inspired by recent results that use triangle counts for community testing and clustering. We show that RTRExtractor has excellent behavior on a large variety of real-world datasets. It is able to process graphs with hundreds of millions of edges within minutes. Across many datasets, RTRExtractor achieves high coverage using high edge density datasets. For example, the output covers a quarter of the vertices with subgraphs of edge density more than (say) $0.5$, for datasets with 10M+ edges. We show an example of how the output of RTRExtractor correlates with meaningful sets of similar vertices in a citation network, demonstrating the utility of RTRExtractor for unsupervised graph discovery tasks.
Related papers
- Spectral Greedy Coresets for Graph Neural Networks [61.24300262316091]
The ubiquity of large-scale graphs in node-classification tasks hinders the real-world applications of Graph Neural Networks (GNNs)
This paper studies graph coresets for GNNs and avoids the interdependence issue by selecting ego-graphs based on their spectral embeddings.
Our spectral greedy graph coreset (SGGC) scales to graphs with millions of nodes, obviates the need for model pre-training, and applies to low-homophily graphs.
arXiv Detail & Related papers (2024-05-27T17:52:12Z) - MGNet: Learning Correspondences via Multiple Graphs [78.0117352211091]
Learning correspondences aims to find correct correspondences from the initial correspondence set with an uneven correspondence distribution and a low inlier rate.
Recent advances usually use graph neural networks (GNNs) to build a single type of graph or stack local graphs into the global one to complete the task.
We propose MGNet to effectively combine multiple complementary graphs.
arXiv Detail & Related papers (2024-01-10T07:58:44Z) - Jaccard-constrained dense subgraph discovery [2.4511852052357628]
We search for dense subgraphs that have large pairwise Jaccard similarity coefficients.
We show experimentally that our algorithms are efficient, they can find ground truth in synthetic datasets and provide interpretable results from real-world datasets.
arXiv Detail & Related papers (2023-08-30T10:33:02Z) - A Robust Stacking Framework for Training Deep Graph Models with
Multifaceted Node Features [61.92791503017341]
Graph Neural Networks (GNNs) with numerical node features and graph structure as inputs have demonstrated superior performance on various supervised learning tasks with graph data.
The best models for such data types in most standard supervised learning settings with IID (non-graph) data are not easily incorporated into a GNN.
Here we propose a robust stacking framework that fuses graph-aware propagation with arbitrary models intended for IID data.
arXiv Detail & Related papers (2022-06-16T22:46:33Z) - A Robust and Generalized Framework for Adversarial Graph Embedding [73.37228022428663]
We propose a robust framework for adversarial graph embedding, named AGE.
AGE generates the fake neighbor nodes as the enhanced negative samples from the implicit distribution.
Based on this framework, we propose three models to handle three types of graph data.
arXiv Detail & Related papers (2021-05-22T07:05:48Z) - COLOGNE: Coordinated Local Graph Neighborhood Sampling [1.6498361958317633]
replacing discrete unordered objects such as graph nodes by real-valued vectors is at the heart of many approaches to learning from graph data.
We address the problem of learning discrete node embeddings such that the coordinates of the node vector representations are graph nodes.
This opens the door to designing interpretable machine learning algorithms for graphs as all attributes originally present in the nodes are preserved.
arXiv Detail & Related papers (2021-02-09T11:39:06Z) - Pseudoinverse Graph Convolutional Networks: Fast Filters Tailored for
Large Eigengaps of Dense Graphs and Hypergraphs [0.0]
Graph Convolutional Networks (GCNs) have proven to be successful tools for semi-supervised classification on graph-based datasets.
We propose a new GCN variant whose three-part filter space is targeted at dense graphs.
arXiv Detail & Related papers (2020-08-03T08:48:41Z) - Grale: Designing Networks for Graph Learning [68.23038997141381]
We present Grale, a scalable method we have developed to address the problem of graph design for graphs with billions of nodes.
Grale operates by fusing together different measures of(potentially weak) similarity to create a graph which exhibits high task-specific homophily between its nodes.
We have deployed Grale in more than 20 different industrial settings at Google, including datasets which have tens of billions of nodes, and hundreds of trillions of potential edges to score.
arXiv Detail & Related papers (2020-07-23T13:25:36Z) - Online Dense Subgraph Discovery via Blurred-Graph Feedback [87.9850024070244]
We introduce a novel learning problem for dense subgraph discovery.
We first propose a edge-time algorithm that obtains a nearly-optimal solution with high probability.
We then design a more scalable algorithm with a theoretical guarantee.
arXiv Detail & Related papers (2020-06-24T11:37:33Z) - Learning Deep Graph Representations via Convolutional Neural Networks [7.1945109570193795]
We propose a framework called DeepMap to learn deep representations for graph feature maps.
The learned deep representation for a graph is a dense and low-dimensional vector that captures complex high-order interactions.
We empirically validate DeepMap on various graph classification benchmarks and demonstrate that it achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-04-05T08:49:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.