Robust Correlation Clustering with Asymmetric Noise
- URL: http://arxiv.org/abs/2110.08385v1
- Date: Fri, 15 Oct 2021 21:47:27 GMT
- Title: Robust Correlation Clustering with Asymmetric Noise
- Authors: Jimit Majmudar, Stephen Vavasis
- Abstract summary: Correlation Clustering is a graph clustering formulation which: (1) takes as input a signed graph with edge weights representing a similarity/dissimilarity measure between the nodes, and (2) requires no prior estimate of the number of clusters in the input graph.
We propose a novel graph generative model, called the Node Factors Model (NFM), which is based on generating feature vectors/embeddings for the graph nodes.
- Score: 3.8073142980733
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Graph clustering problems typically aim to partition the graph nodes such
that two nodes belong to the same partition set if and only if they are
similar. Correlation Clustering is a graph clustering formulation which: (1)
takes as input a signed graph with edge weights representing a
similarity/dissimilarity measure between the nodes, and (2) requires no prior
estimate of the number of clusters in the input graph. However, the
combinatorial optimization problem underlying Correlation Clustering is
NP-hard. In this work, we propose a novel graph generative model, called the
Node Factors Model (NFM), which is based on generating feature
vectors/embeddings for the graph nodes. The graphs generated by the NFM contain
asymmetric noise in the sense that there may exist pairs of nodes in the same
cluster which are negatively correlated. We propose a novel Correlation
Clustering algorithm, called \anormd, using techniques from semidefinite
programming. Using a combination of theoretical and computational results, we
demonstrate that $\texttt{$\ell_2$-norm-diag}$ recovers nodes with sufficiently
strong cluster membership in graph instances generated by the NFM, thereby
making progress towards establishing the provable robustness of our proposed
algorithm.
Related papers
- HeNCler: Node Clustering in Heterophilous Graphs through Learned Asymmetric Similarity [55.27586970082595]
HeNCler is a novel approach for Heterophilous Node Clustering.
We show that HeNCler significantly enhances performance in node clustering tasks within heterophilous graph contexts.
arXiv Detail & Related papers (2024-05-27T11:04:05Z) - MeanCut: A Greedy-Optimized Graph Clustering via Path-based Similarity
and Degree Descent Criterion [0.6906005491572401]
spectral clustering is popular and attractive due to the remarkable performance, easy implementation, and strong adaptability.
We propose MeanCut as the objective function and greedily optimize it in degree descending order for a nondestructive graph partition.
The validity of our algorithm is demonstrated by testifying on real-world benchmarks and application of face recognition.
arXiv Detail & Related papers (2023-12-07T06:19:39Z) - Latent Random Steps as Relaxations of Max-Cut, Min-Cut, and More [30.919536115917726]
We present a probabilistic model based on non-negative matrix factorization which unifies clustering and simplification.
By relaxing the hard clustering to a soft clustering, our algorithm relaxes potentially hard clustering problems to a tractable ones.
arXiv Detail & Related papers (2023-08-12T02:47:57Z) - Graph Mixup with Soft Alignments [49.61520432554505]
We study graph data augmentation by mixup, which has been used successfully on images.
We propose S-Mixup, a simple yet effective mixup method for graph classification by soft alignments.
arXiv Detail & Related papers (2023-06-11T22:04:28Z) - Dink-Net: Neural Clustering on Large Graphs [59.10189693120368]
A deep graph clustering method (Dink-Net) is proposed with the idea of dilation and shrink.
By discriminating nodes, whether being corrupted by augmentations, representations are learned in a self-supervised manner.
The clustering distribution is optimized by minimizing the proposed cluster dilation loss and cluster shrink loss.
Compared to the runner-up, Dink-Net 9.62% achieves NMI improvement on the ogbn-papers100M dataset with 111 million nodes and 1.6 billion edges.
arXiv Detail & Related papers (2023-05-28T15:33:24Z) - One-step Bipartite Graph Cut: A Normalized Formulation and Its
Application to Scalable Subspace Clustering [56.81492360414741]
We show how to enforce a one-step normalized cut for bipartite graphs, especially with linear-time complexity.
In this paper, we first characterize a novel one-step bipartite graph cut criterion with normalized constraints, and theoretically prove its equivalence to a trace problem.
We extend this cut criterion to a scalable subspace clustering approach, where adaptive anchor learning, bipartite graph learning, and one-step normalized bipartite graph partitioning are simultaneously modeled.
arXiv Detail & Related papers (2023-05-12T11:27:20Z) - ClusterFuG: Clustering Fully connected Graphs by Multicut [20.254912065749956]
In dense multicut, the clustering objective is given in a factorized form as inner products of node feature vectors.
We show how to rewrite classical greedy algorithms for multicut in our dense setting and how to modify them for greater efficiency and solution quality.
arXiv Detail & Related papers (2023-01-28T11:10:50Z) - AnchorGAE: General Data Clustering via $O(n)$ Bipartite Graph
Convolution [79.44066256794187]
We show how to convert a non-graph dataset into a graph by introducing the generative graph model, which is used to build graph convolution networks (GCNs)
A bipartite graph constructed by anchors is updated dynamically to exploit the high-level information behind data.
We theoretically prove that the simple update will lead to degeneration and a specific strategy is accordingly designed.
arXiv Detail & Related papers (2021-11-12T07:08:13Z) - Explicit Pairwise Factorized Graph Neural Network for Semi-Supervised
Node Classification [59.06717774425588]
We propose the Explicit Pairwise Factorized Graph Neural Network (EPFGNN), which models the whole graph as a partially observed Markov Random Field.
It contains explicit pairwise factors to model output-output relations and uses a GNN backbone to model input-output relations.
We conduct experiments on various datasets, which shows that our model can effectively improve the performance for semi-supervised node classification on graphs.
arXiv Detail & Related papers (2021-07-27T19:47:53Z) - Seeing All From a Few: Nodes Selection Using Graph Pooling for Graph
Clustering [37.68977275752782]
noisy edges and nodes in the graph may make the clustering results worse.
We propose a novel dual graph embedding network(DGEN) to improve the robustness of the graph clustering to the noisy nodes and edges.
Experiments on three benchmark graph datasets demonstrate the superiority compared with several state-of-the-art algorithms.
arXiv Detail & Related papers (2021-04-30T06:51:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.