Sublinear Algorithms for Hierarchical Clustering
- URL: http://arxiv.org/abs/2206.07633v1
- Date: Wed, 15 Jun 2022 16:25:27 GMT
- Title: Sublinear Algorithms for Hierarchical Clustering
- Authors: Arpit Agarwal, Sanjeev Khanna, Huan Li, Prathamesh Patil
- Abstract summary: We study hierarchical clustering for massive graphs under three well-studied models of sublinear computation.
We design sublinear algorithms for hierarchical clustering in all three models.
- Score: 14.124026862687941
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hierarchical clustering over graphs is a fundamental task in data mining and
machine learning with applications in domains such as phylogenetics, social
network analysis, and information retrieval. Specifically, we consider the
recently popularized objective function for hierarchical clustering due to
Dasgupta. Previous algorithms for (approximately) minimizing this objective
function require linear time/space complexity. In many applications the
underlying graph can be massive in size making it computationally challenging
to process the graph even using a linear time/space algorithm. As a result,
there is a strong interest in designing algorithms that can perform global
computation using only sublinear resources. The focus of this work is to study
hierarchical clustering for massive graphs under three well-studied models of
sublinear computation which focus on space, time, and communication,
respectively, as the primary resources to optimize: (1) (dynamic) streaming
model where edges are presented as a stream, (2) query model where the graph is
queried using neighbor and degree queries, (3) MPC model where the graph edges
are partitioned over several machines connected via a communication channel.
We design sublinear algorithms for hierarchical clustering in all three
models above. At the heart of our algorithmic results is a view of the
objective in terms of cuts in the graph, which allows us to use a relaxed
notion of cut sparsifiers to do hierarchical clustering while introducing only
a small distortion in the objective function. Our main algorithmic
contributions are then to show how cut sparsifiers of the desired form can be
efficiently constructed in the query model and the MPC model. We complement our
algorithmic results by establishing nearly matching lower bounds that rule out
the possibility of designing better algorithms in each of these models.
Related papers
- Combinatorial Approximations for Cluster Deletion: Simpler, Faster, and Better [18.121514220195607]
Cluster deletion is an NP-hard graph clustering objective with applications in computational and social network analysis.
We first provide a tighter analysis of two previous approximation algorithms, improving their approximation guarantees from 4 to 3.
We show that both algorithms can be derandomized in a surprisingly simple way, by greedily taking a maximum degree in an auxiliary graph and forming a cluster around it.
arXiv Detail & Related papers (2024-04-24T18:39:18Z) - Ensemble Quadratic Assignment Network for Graph Matching [52.20001802006391]
Graph matching is a commonly used technique in computer vision and pattern recognition.
Recent data-driven approaches have improved the graph matching accuracy remarkably.
We propose a graph neural network (GNN) based approach to combine the advantages of data-driven and traditional methods.
arXiv Detail & Related papers (2024-03-11T06:34:05Z) - One-step Bipartite Graph Cut: A Normalized Formulation and Its
Application to Scalable Subspace Clustering [56.81492360414741]
We show how to enforce a one-step normalized cut for bipartite graphs, especially with linear-time complexity.
In this paper, we first characterize a novel one-step bipartite graph cut criterion with normalized constraints, and theoretically prove its equivalence to a trace problem.
We extend this cut criterion to a scalable subspace clustering approach, where adaptive anchor learning, bipartite graph learning, and one-step normalized bipartite graph partitioning are simultaneously modeled.
arXiv Detail & Related papers (2023-05-12T11:27:20Z) - Efficient Graph Field Integrators Meet Point Clouds [59.27295475120132]
We present two new classes of algorithms for efficient field integration on graphs encoding point clouds.
The first class, SeparatorFactorization(SF), leverages the bounded genus of point cloud mesh graphs, while the second class, RFDiffusion(RFD), uses popular epsilon-nearest-neighbor graph representations for point clouds.
arXiv Detail & Related papers (2023-02-02T08:33:36Z) - Learning Optimal Graph Filters for Clustering of Attributed Graphs [20.810096547938166]
Many real-world systems can be represented as graphs where the different entities in the system are presented by nodes and their interactions by edges.
An important task in studying large datasets with graphical structure is graph clustering.
We introduce a graph signal processing based approach, where we learn the parameters of Finite Impulse Response (FIR) and Autoregressive Moving Average (ARMA) graph filters optimized for clustering.
arXiv Detail & Related papers (2022-11-09T01:49:23Z) - Optimal Propagation for Graph Neural Networks [51.08426265813481]
We propose a bi-level optimization approach for learning the optimal graph structure.
We also explore a low-rank approximation model for further reducing the time complexity.
arXiv Detail & Related papers (2022-05-06T03:37:00Z) - Reinforcement Learning Based Query Vertex Ordering Model for Subgraph
Matching [58.39970828272366]
Subgraph matching algorithms enumerate all is embeddings of a query graph in a data graph G.
matching order plays a critical role in time efficiency of these backtracking based subgraph matching algorithms.
In this paper, for the first time we apply the Reinforcement Learning (RL) and Graph Neural Networks (GNNs) techniques to generate the high-quality matching order for subgraph matching algorithms.
arXiv Detail & Related papers (2022-01-25T00:10:03Z) - Hierarchical Clustering: $O(1)$-Approximation for Well-Clustered Graphs [3.2901541059183432]
We present two-time approximation algorithms for hierarchical clustering.
The significance of our work is demonstrated by the empirical analysis on both synthetic and real-world data sets.
arXiv Detail & Related papers (2021-12-16T17:52:04Z) - Effective and efficient structure learning with pruning and model
averaging strategies [9.023722579074734]
This paper describes an approximate BN structure learning algorithm that combines two novel strategies with hill-climbing search.
The algorithm starts by pruning the search space graphs, where the pruning strategy can be viewed as an aggressive version of the pruning strategies.
It then performs model averaging in the hill-climbing search process and moves to the neighbouring graph that maximises the objective function.
arXiv Detail & Related papers (2021-12-01T10:35:34Z) - Effective and Efficient Graph Learning for Multi-view Clustering [173.8313827799077]
We propose an effective and efficient graph learning model for multi-view clustering.
Our method exploits the view-similar between graphs of different views by the minimization of tensor Schatten p-norm.
Our proposed algorithm is time-economical and obtains the stable results and scales well with the data size.
arXiv Detail & Related papers (2021-08-15T13:14:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.