TANGO: Clustering with Typicality-Aware Nonlocal Mode-Seeking and Graph-Cut Optimization
- URL: http://arxiv.org/abs/2408.10084v1
- Date: Mon, 19 Aug 2024 15:26:25 GMT
- Title: TANGO: Clustering with Typicality-Aware Nonlocal Mode-Seeking and Graph-Cut Optimization
- Authors: Haowen Ma, Zhiguo Long, Hua Meng,
- Abstract summary: Density-based clustering methods by mode-seeking usually achieve clustering by using local density estimation to mine structural information.
We propose a new algorithm (TANGO) to establish local dependencies by exploiting a global-view emphtypicality of points.
It achieves final clustering by employing graph-cut on sub-clusters, thus avoiding the challenging selection of cluster centers.
- Score: 2.4783546111391215
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Density-based clustering methods by mode-seeking usually achieve clustering by using local density estimation to mine structural information, such as local dependencies from lower density points to higher neighbors. However, they often rely too heavily on \emph{local} structures and neglect \emph{global} characteristics, which can lead to significant errors in peak selection and dependency establishment. Although introducing more hyperparameters that revise dependencies can help mitigate this issue, tuning them is challenging and even impossible on real-world datasets. In this paper, we propose a new algorithm (TANGO) to establish local dependencies by exploiting a global-view \emph{typicality} of points, which is obtained by mining further the density distributions and initial dependencies. TANGO then obtains sub-clusters with the help of the adjusted dependencies, and characterizes the similarity between sub-clusters by incorporating path-based connectivity. It achieves final clustering by employing graph-cut on sub-clusters, thus avoiding the challenging selection of cluster centers. Moreover, this paper provides theoretical analysis and an efficient method for the calculation of typicality. Experimental results on several synthetic and $16$ real-world datasets demonstrate the effectiveness and superiority of TANGO.
Related papers
- An Enhanced Model-based Approach for Short Text Clustering [58.60681789677676]
Short text clustering has become increasingly important with the popularity of social media like Twitter, Google+, and Facebook.<n>Existing methods can be broadly categorized into two paradigms: topic model-based approaches and deep representation learning-based approaches.<n>We propose a collapsed Gibbs Sampling algorithm for the Dirichlet Multinomial Mixture model (GSDMM), which effectively handles the sparsity and high dimensionality of short texts.<n>Based on several aspects of GSDMM that warrant further refinement, we propose an improved approach, GSDMM+, designed to further optimize its performance.
arXiv Detail & Related papers (2025-07-18T10:07:42Z) - Graph-based Semi-supervised and Unsupervised Methods for Local Clustering [0.0]
Local clustering aims to identify specific substructures within a large graph without requiring full knowledge of the entire graph.
We first propose a method for identifying specific local clusters when very few labeled data is given, which we term semi-supervised local clustering.
We then extend this approach to the unsupervised setting when no prior information on labels is available.
arXiv Detail & Related papers (2025-04-28T02:10:18Z) - Towards Learnable Anchor for Deep Multi-View Clustering [49.767879678193005]
In this paper, we propose the Deep Multi-view Anchor Clustering (DMAC) model that performs clustering in linear time.
With the optimal anchors, the full sample graph is calculated to derive a discriminative embedding for clustering.
Experiments on several datasets demonstrate superior performance and efficiency of DMAC compared to state-of-the-art competitors.
arXiv Detail & Related papers (2025-03-16T09:38:11Z) - Graph Probability Aggregation Clustering [5.377020739388736]
We propose a graph-based fuzzy clustering algorithm that unifies the global clustering objective function with a local clustering constraint.
The entire GPAC framework is formulated as a multi-constrained optimization problem, which can be solved using the Lagrangian method.
Experiments conducted on synthetic, real-world, and deep learning datasets demonstrate that GPAC not only exceeds existing state-of-the-art methods in clustering performance but also excels in computational efficiency.
arXiv Detail & Related papers (2025-02-27T09:11:32Z) - ACTGNN: Assessment of Clustering Tendency with Synthetically-Trained Graph Neural Networks [4.668678950572517]
ACTGNN is a graph-based framework designed to assess clustering tendency by leveraging graph representations of data.
A Graph Neural Network (GNN) is trained exclusively on synthetic datasets, enabling robust learning of clustering structures.
Our results highlight the generalizability and effectiveness of the proposed approach, making it a promising tool for robust clustering tendency assessment.
arXiv Detail & Related papers (2025-01-30T03:31:26Z) - Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - Clustering by Mining Density Distributions and Splitting Manifold Structure [2.3759432635713895]
A top-down approach was recently proposed to improve the efficiency of spectral clustering.
This paper proposes to start from local structures to obtain micro-clusters.
A novel similarity measure between micro-clusters is then proposed for the final spectral clustering.
arXiv Detail & Related papers (2024-08-20T02:22:59Z) - Rethinking Clustered Federated Learning in NOMA Enhanced Wireless
Networks [60.09912912343705]
This study explores the benefits of integrating the novel clustered federated learning (CFL) approach with non-independent and identically distributed (non-IID) datasets.
A detailed theoretical analysis of the generalization gap that measures the degree of non-IID in the data distribution is presented.
Solutions to address the challenges posed by non-IID conditions are proposed with the analysis of the properties.
arXiv Detail & Related papers (2024-03-05T17:49:09Z) - Deep Embedding Clustering Driven by Sample Stability [16.53706617383543]
We propose a deep embedding clustering algorithm driven by sample stability (DECS)
Specifically, we start by constructing the initial feature space with an autoencoder and then learn the cluster-oriented embedding feature constrained by sample stability.
The experimental results on five datasets illustrate that the proposed method achieves superior performance compared to state-of-the-art clustering approaches.
arXiv Detail & Related papers (2024-01-29T09:19:49Z) - MeanCut: A Greedy-Optimized Graph Clustering via Path-based Similarity
and Degree Descent Criterion [0.6906005491572401]
spectral clustering is popular and attractive due to the remarkable performance, easy implementation, and strong adaptability.
We propose MeanCut as the objective function and greedily optimize it in degree descending order for a nondestructive graph partition.
The validity of our algorithm is demonstrated by testifying on real-world benchmarks and application of face recognition.
arXiv Detail & Related papers (2023-12-07T06:19:39Z) - Optimality of Message-Passing Architectures for Sparse Graphs [12.42591017155152]
We study the node classification problem on feature-decorated graphs in the sparse setting, i.e., when the expected degree of a node is $O(1)$ in the number of nodes.<n>We introduce a notion of Bayes optimality for node classification tasks, called local Bayes optimality.<n>We show that the optimal message-passing architecture interpolates between a standard in the regime of low graph signal and a typical in the regime of high graph signal.
arXiv Detail & Related papers (2023-05-17T17:31:20Z) - Graph-based Semi-supervised Local Clustering with Few Labeled Nodes [6.493238575291165]
Local clustering aims at extracting a local structure inside a graph without the necessity of knowing the entire graph structure.
We propose a new semi-supervised local clustering approach using only few labeled nodes.
arXiv Detail & Related papers (2022-11-20T22:55:07Z) - Okapi: Generalising Better by Making Statistical Matches Match [7.392460712829188]
Okapi is a simple, efficient, and general method for robust semi-supervised learning based on online statistical matching.
Our method uses a nearest-neighbours-based matching procedure to generate cross-domain views for a consistency loss.
We show that it is in fact possible to leverage additional unlabelled data to improve upon empirical risk minimisation.
arXiv Detail & Related papers (2022-11-07T12:41:17Z) - Local Sample-weighted Multiple Kernel Clustering with Consensus
Discriminative Graph [73.68184322526338]
Multiple kernel clustering (MKC) is committed to achieving optimal information fusion from a set of base kernels.
This paper proposes a novel local sample-weighted multiple kernel clustering model.
Experimental results demonstrate that our LSWMKC possesses better local manifold representation and outperforms existing kernel or graph-based clustering algo-rithms.
arXiv Detail & Related papers (2022-07-05T05:00:38Z) - Optimal Propagation for Graph Neural Networks [51.08426265813481]
We propose a bi-level optimization approach for learning the optimal graph structure.
We also explore a low-rank approximation model for further reducing the time complexity.
arXiv Detail & Related papers (2022-05-06T03:37:00Z) - Clustered Federated Learning via Generalized Total Variation
Minimization [83.26141667853057]
We study optimization methods to train local (or personalized) models for local datasets with a decentralized network structure.
Our main conceptual contribution is to formulate federated learning as total variation minimization (GTV)
Our main algorithmic contribution is a fully decentralized federated learning algorithm.
arXiv Detail & Related papers (2021-05-26T18:07:19Z) - Towards Uncovering the Intrinsic Data Structures for Unsupervised Domain
Adaptation using Structurally Regularized Deep Clustering [119.88565565454378]
Unsupervised domain adaptation (UDA) is to learn classification models that make predictions for unlabeled data on a target domain.
We propose a hybrid model of Structurally Regularized Deep Clustering, which integrates the regularized discriminative clustering of target data with a generative one.
Our proposed H-SRDC outperforms all the existing methods under both the inductive and transductive settings.
arXiv Detail & Related papers (2020-12-08T08:52:00Z) - Coping with Label Shift via Distributionally Robust Optimisation [72.80971421083937]
We propose a model that minimises an objective based on distributionally robust optimisation (DRO)
We then design and analyse a gradient descent-proximal mirror ascent algorithm tailored for large-scale problems to optimise the proposed objective.
arXiv Detail & Related papers (2020-10-23T08:33:04Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Structured Graph Learning for Clustering and Semi-supervised
Classification [74.35376212789132]
We propose a graph learning framework to preserve both the local and global structure of data.
Our method uses the self-expressiveness of samples to capture the global structure and adaptive neighbor approach to respect the local structure.
Our model is equivalent to a combination of kernel k-means and k-means methods under certain condition.
arXiv Detail & Related papers (2020-08-31T08:41:20Z) - Locally induced Gaussian processes for large-scale simulation
experiments [0.0]
We show how placement of inducing points and their multitude can be thwarted by pathologies.
Our proposed methodology hybridizes global inducing point and data subset-based local GP approximation.
We show that local inducing points extend their global and data-subset component parts on the accuracy--computational efficiency frontier.
arXiv Detail & Related papers (2020-08-28T21:37:46Z) - Making Affine Correspondences Work in Camera Geometry Computation [62.7633180470428]
Local features provide region-to-region rather than point-to-point correspondences.
We propose guidelines for effective use of region-to-region matches in the course of a full model estimation pipeline.
Experiments show that affine solvers can achieve accuracy comparable to point-based solvers at faster run-times.
arXiv Detail & Related papers (2020-07-20T12:07:48Z) - FedPD: A Federated Learning Framework with Optimal Rates and Adaptivity
to Non-IID Data [59.50904660420082]
Federated Learning (FL) has become a popular paradigm for learning from distributed data.
To effectively utilize data at different devices without moving them to the cloud, algorithms such as the Federated Averaging (FedAvg) have adopted a "computation then aggregation" (CTA) model.
arXiv Detail & Related papers (2020-05-22T23:07:42Z) - PushNet: Efficient and Adaptive Neural Message Passing [1.9121961872220468]
Message passing neural networks have recently evolved into a state-of-the-art approach to representation learning on graphs.
Existing methods perform synchronous message passing along all edges in multiple subsequent rounds.
We consider a novel asynchronous message passing approach where information is pushed only along the most relevant edges until convergence.
arXiv Detail & Related papers (2020-03-04T18:15:30Z) - CycleCluster: Modernising Clustering Regularisation for Deep
Semi-Supervised Classification [0.0]
We propose a novel framework, CycleCluster, for deep semi-supervised classification.
Our core optimisation is driven by a new clustering based regularisation along with a graph based pseudo-labels and a shared deep network.
arXiv Detail & Related papers (2020-01-15T13:34:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.