Mining Large Quasi-cliques with Quality Guarantees from Vertex
Neighborhoods
- URL: http://arxiv.org/abs/2008.07996v1
- Date: Tue, 18 Aug 2020 15:50:25 GMT
- Title: Mining Large Quasi-cliques with Quality Guarantees from Vertex
Neighborhoods
- Authors: Aritra Konar, and Nicholas D. Sidiropoulos
- Abstract summary: Mining dense subgraphs is an important primitive across a spectrum of graph-mining tasks.
We show that mining cliques and quasi-cliques of non-trivial sizes from real-world graphs is often not a difficult problem.
- Score: 44.007522460918565
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mining dense subgraphs is an important primitive across a spectrum of
graph-mining tasks. In this work, we formally establish that two recurring
characteristics of real-world graphs, namely heavy-tailed degree distributions
and large clustering coefficients, imply the existence of substantially large
vertex neighborhoods with high edge-density. This observation suggests a very
simple approach for extracting large quasi-cliques: simply scan the vertex
neighborhoods, compute the clustering coefficient of each vertex, and output
the best such subgraph. The implementation of such a method requires counting
the triangles in a graph, which is a well-studied problem in graph mining. When
empirically tested across a number of real-world graphs, this approach reveals
a surprise: vertex neighborhoods include maximal cliques of non-trivial sizes,
and the density of the best neighborhood often compares favorably to subgraphs
produced by dedicated algorithms for maximizing subgraph density. For graphs
with small clustering coefficients, we demonstrate that small vertex
neighborhoods can be refined using a local-search method to ``grow'' larger
cliques and near-cliques. Our results indicate that contrary to worst-case
theoretical results, mining cliques and quasi-cliques of non-trivial sizes from
real-world graphs is often not a difficult problem, and provides motivation for
further work geared towards a better explanation of these empirical successes.
Related papers
- Latent Random Steps as Relaxations of Max-Cut, Min-Cut, and More [30.919536115917726]
We present a probabilistic model based on non-negative matrix factorization which unifies clustering and simplification.
By relaxing the hard clustering to a soft clustering, our algorithm relaxes potentially hard clustering problems to a tractable ones.
arXiv Detail & Related papers (2023-08-12T02:47:57Z) - Tight and fast generalization error bound of graph embedding in metric
space [54.279425319381374]
We show that graph embedding in non-Euclidean metric space can outperform that in Euclidean space with much smaller training data than the existing bound has suggested.
Our new upper bound is significantly tighter and faster than the existing one, which can be exponential to $R$ and $O(frac1S)$ at the fastest.
arXiv Detail & Related papers (2023-05-13T17:29:18Z) - Contrastive Graph Clustering in Curvature Spaces [74.03252813800334]
We present a novel end-to-end contrastive graph clustering model named CONGREGATE.
To support geometric clustering, we construct a theoretically grounded Heterogeneous Curvature Space.
We then train the graph clusters by an augmentation-free reweighted contrastive approach.
arXiv Detail & Related papers (2023-05-05T14:04:52Z) - Scalable and Effective Conductance-based Graph Clustering [9.938406925123722]
We develop a graph clustering framework textitPCon.
We show that many existing solutions can be reduced to our framework.
Based on our framework, we propose two novel algorithms textitPCon_core and emphPCon_de with linear time and space complexity.
arXiv Detail & Related papers (2022-11-22T12:45:27Z) - Unveiling the Sampling Density in Non-Uniform Geometric Graphs [69.93864101024639]
We consider graphs as geometric graphs: nodes are randomly sampled from an underlying metric space, and any pair of nodes is connected if their distance is less than a specified neighborhood radius.
In a social network communities can be modeled as densely sampled areas, and hubs as nodes with larger neighborhood radius.
We develop methods to estimate the unknown sampling density in a self-supervised fashion.
arXiv Detail & Related papers (2022-10-15T08:01:08Z) - A Note on Graph-Based Nearest Neighbor Search [4.38837720322254]
We show that high clustering coefficient makes most of the k nearest neighbors of q sit in a maximum strongly connected component ( SCC) in the graph.
We prove that the commonly used graph-based search algorithm is guaranteed to traverse the maximum SCC once visiting any point in it.
arXiv Detail & Related papers (2020-12-21T02:18:05Z) - LCS Graph Kernel Based on Wasserstein Distance in Longest Common
Subsequence Metric Space [22.215852332444904]
We propose a graph kernel to compute more comprehensive similarity between paths and walks.
We also combine it with optimal transport theory to extract more in-depth features of graphs.
Our proposed method outperforms many state-of-the-art graph kernel methods.
arXiv Detail & Related papers (2020-12-07T11:59:14Z) - Online Dense Subgraph Discovery via Blurred-Graph Feedback [87.9850024070244]
We introduce a novel learning problem for dense subgraph discovery.
We first propose a edge-time algorithm that obtains a nearly-optimal solution with high probability.
We then design a more scalable algorithm with a theoretical guarantee.
arXiv Detail & Related papers (2020-06-24T11:37:33Z) - Block-Approximated Exponential Random Graphs [77.4792558024487]
An important challenge in the field of exponential random graphs (ERGs) is the fitting of non-trivial ERGs on large graphs.
We propose an approximative framework to such non-trivial ERGs that result in dyadic independence (i.e., edge independent) distributions.
Our methods are scalable to sparse graphs consisting of millions of nodes.
arXiv Detail & Related papers (2020-02-14T11:42:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.