Related papers: Mining Large Quasi-cliques with Quality Guarantees from Vertex Neighborhoods

Mining Large Quasi-cliques with Quality Guarantees from Vertex Neighborhoods

URL: http://arxiv.org/abs/2008.07996v1
Date: Tue, 18 Aug 2020 15:50:25 GMT
Title: Mining Large Quasi-cliques with Quality Guarantees from Vertex Neighborhoods
Authors: Aritra Konar, and Nicholas D. Sidiropoulos
Abstract summary: Mining dense subgraphs is an important primitive across a spectrum of graph-mining tasks. We show that mining cliques and quasi-cliques of non-trivial sizes from real-world graphs is often not a difficult problem.
Score: 44.007522460918565
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Mining dense subgraphs is an important primitive across a spectrum of graph-mining tasks. In this work, we formally establish that two recurring characteristics of real-world graphs, namely heavy-tailed degree distributions and large clustering coefficients, imply the existence of substantially large vertex neighborhoods with high edge-density. This observation suggests a very simple approach for extracting large quasi-cliques: simply scan the vertex neighborhoods, compute the clustering coefficient of each vertex, and output the best such subgraph. The implementation of such a method requires counting the triangles in a graph, which is a well-studied problem in graph mining. When empirically tested across a number of real-world graphs, this approach reveals a surprise: vertex neighborhoods include maximal cliques of non-trivial sizes, and the density of the best neighborhood often compares favorably to subgraphs produced by dedicated algorithms for maximizing subgraph density. For graphs with small clustering coefficients, we demonstrate that small vertex neighborhoods can be refined using a local-search method to ``grow'' larger cliques and near-cliques. Our results indicate that contrary to worst-case theoretical results, mining cliques and quasi-cliques of non-trivial sizes from real-world graphs is often not a difficult problem, and provides motivation for further work geared towards a better explanation of these empirical successes.

Related papers

Provably Extending PageRank-based Local Clustering Algorithm to Weighted Directed Graphs with Self-Loops and to Hypergraphs [40.215737469808026]
This work focuses on graph local clustering, which has broad applications beyond graphs because of the internal connectivities within various modalities. We extend the non-approximating Andersen-Chung-Lang ("ACL") algorithm beyond discrete graphs and generalize its quadratic optimality to a wider range of graphs. We theoretically prove that, under two mild conditions, both algorithms can identify a quadratically optimal local cluster in terms of conductance with at least 1/2 probability.
arXiv Detail & Related papers (2024-12-04T03:56:14Z)
Latent Random Steps as Relaxations of Max-Cut, Min-Cut, and More [30.919536115917726]
We present a probabilistic model based on non-negative matrix factorization which unifies clustering and simplification. By relaxing the hard clustering to a soft clustering, our algorithm relaxes potentially hard clustering problems to a tractable ones.
arXiv Detail & Related papers (2023-08-12T02:47:57Z)
Tight and fast generalization error bound of graph embedding in metric space [54.279425319381374]
We show that graph embedding in non-Euclidean metric space can outperform that in Euclidean space with much smaller training data than the existing bound has suggested. Our new upper bound is significantly tighter and faster than the existing one, which can be exponential to $R$ and $O(frac1S)$ at the fastest.
arXiv Detail & Related papers (2023-05-13T17:29:18Z)
Contrastive Graph Clustering in Curvature Spaces [74.03252813800334]
We present a novel end-to-end contrastive graph clustering model named CONGREGATE. To support geometric clustering, we construct a theoretically grounded Heterogeneous Curvature Space. We then train the graph clusters by an augmentation-free reweighted contrastive approach.
arXiv Detail & Related papers (2023-05-05T14:04:52Z)
Scalable and Effective Conductance-based Graph Clustering [9.938406925123722]
We develop a graph clustering framework textitPCon. We show that many existing solutions can be reduced to our framework. Based on our framework, we propose two novel algorithms textitPCon_core and emphPCon_de with linear time and space complexity.
arXiv Detail & Related papers (2022-11-22T12:45:27Z)
Unveiling the Sampling Density in Non-Uniform Geometric Graphs [69.93864101024639]
We consider graphs as geometric graphs: nodes are randomly sampled from an underlying metric space, and any pair of nodes is connected if their distance is less than a specified neighborhood radius. In a social network communities can be modeled as densely sampled areas, and hubs as nodes with larger neighborhood radius. We develop methods to estimate the unknown sampling density in a self-supervised fashion.
arXiv Detail & Related papers (2022-10-15T08:01:08Z)
A Note on Graph-Based Nearest Neighbor Search [4.38837720322254]
We show that high clustering coefficient makes most of the k nearest neighbors of q sit in a maximum strongly connected component ( SCC) in the graph. We prove that the commonly used graph-based search algorithm is guaranteed to traverse the maximum SCC once visiting any point in it.
arXiv Detail & Related papers (2020-12-21T02:18:05Z)
LCS Graph Kernel Based on Wasserstein Distance in Longest Common Subsequence Metric Space [22.215852332444904]
We propose a graph kernel to compute more comprehensive similarity between paths and walks. We also combine it with optimal transport theory to extract more in-depth features of graphs. Our proposed method outperforms many state-of-the-art graph kernel methods.
arXiv Detail & Related papers (2020-12-07T11:59:14Z)
Online Dense Subgraph Discovery via Blurred-Graph Feedback [87.9850024070244]
We introduce a novel learning problem for dense subgraph discovery. We first propose a edge-time algorithm that obtains a nearly-optimal solution with high probability. We then design a more scalable algorithm with a theoretical guarantee.
arXiv Detail & Related papers (2020-06-24T11:37:33Z)
Block-Approximated Exponential Random Graphs [77.4792558024487]
An important challenge in the field of exponential random graphs (ERGs) is the fitting of non-trivial ERGs on large graphs. We propose an approximative framework to such non-trivial ERGs that result in dyadic independence (i.e., edge independent) distributions. Our methods are scalable to sparse graphs consisting of millions of nodes.
arXiv Detail & Related papers (2020-02-14T11:42:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.