Related papers: Randomized Greedy Algorithms and Composable Coreset for k-Center Clustering with Outliers

Randomized Greedy Algorithms and Composable Coreset for k-Center Clustering with Outliers

URL: http://arxiv.org/abs/2301.02814v1
Date: Sat, 7 Jan 2023 09:26:01 GMT
Title: Randomized Greedy Algorithms and Composable Coreset for k-Center Clustering with Outliers
Authors: Hu Ding, Ruomin Huang, Kai Liu, Haikuo Yu, Zixiu Wang
Abstract summary: The presence of outliers can significantly increase the computational complexity. Our idea is inspired by the greedy method, that was developed for solving the ordinary $k$-center clustering problem.
Score: 11.546734084378683
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we study the problem of {\em $k$-center clustering with outliers}. The problem has many important applications in real world, but the presence of outliers can significantly increase the computational complexity. Though a number of methods have been developed in the past decades, it is still quite challenging to design quality guaranteed algorithm with low complexity for this problem. Our idea is inspired by the greedy method, Gonzalez's algorithm, that was developed for solving the ordinary $k$-center clustering problem. Based on some novel observations, we show that a simple randomized version of this greedy strategy actually can handle outliers efficiently. We further show that this randomized greedy approach also yields small coreset for the problem in doubling metrics (even if the doubling dimension is not given), which can greatly reduce the computational complexity. Moreover, together with the partial clustering framework proposed in arXiv:1703.01539 , we prove that our coreset method can be applied to distributed data with a low communication complexity. The experimental results suggest that our algorithms can achieve near optimal solutions and yield lower complexities comparing with the existing methods.

Related papers

A Greedy Strategy for Graph Cut [95.2841574410968]
We propose a greedy strategy to solve the problem of Graph Cut, called GGC. It starts from the state where each data sample is regarded as a cluster and dynamically merges the two clusters. GGC has a nearly linear computational complexity with respect to the number of samples.
arXiv Detail & Related papers (2024-12-28T05:49:42Z)
From Large to Small Datasets: Size Generalization for Clustering Algorithm Selection [12.993073967843292]
We study a problem in a semi-supervised setting, with an unknown ground-truth clustering. We introduce a notion of size generalization for clustering algorithm accuracy. We use a subsample of as little as 5% of the data to identify which algorithm is best on the full dataset.
arXiv Detail & Related papers (2024-02-22T06:53:35Z)
A cutting plane algorithm for globally solving low dimensional k-means clustering problems [4.5594982923247995]
We consider the k-means problem for instances with low dimensional data and formulate it as a structured concave assignment problem. This allows us to exploit the low dimensional structure and solve the problem to global optimality within reasonable time. The paper combines methods from global optimization theory to accelerate the procedure, and we provide numerical results.
arXiv Detail & Related papers (2024-02-21T07:55:33Z)
A Novel Normalized-Cut Solver with Nearest Neighbor Hierarchical Initialization [107.07093621337084]
Normalized-Cut (N-Cut) is a famous model of spectral clustering. Traditional N-Cut solvers are two-stage: 1) calculating the continuous spectral embedding of normalized Laplacian matrix; 2) discretization via $K$-means or spectral rotation. We propose a novel N-Cut solver based on the famous coordinate descent method.
arXiv Detail & Related papers (2023-11-26T07:11:58Z)
Scalable Clustering: Large Scale Unsupervised Learning of Gaussian Mixture Models with Outliers [5.478764356647437]
This paper introduces a provably robust clustering algorithm based on loss minimization. It provides theoretical guarantees that the algorithm obtains high accuracy with high probability. Experiments on real-world large-scale datasets demonstrate the effectiveness of the algorithm.
arXiv Detail & Related papers (2023-02-28T14:39:18Z)
Differentially-Private Clustering of Easy Instances [67.04951703461657]
In differentially private clustering, the goal is to identify $k$ cluster centers without disclosing information on individual data points. We provide implementable differentially private clustering algorithms that provide utility when the data is "easy" We propose a framework that allows us to apply non-private clustering algorithms to the easy instances and privately combine the results.
arXiv Detail & Related papers (2021-12-29T08:13:56Z)
Lattice-Based Methods Surpass Sum-of-Squares in Clustering [98.46302040220395]
Clustering is a fundamental primitive in unsupervised learning. Recent work has established lower bounds against the class of low-degree methods. We show that, perhaps surprisingly, this particular clustering model textitdoes not exhibit a statistical-to-computational gap.
arXiv Detail & Related papers (2021-12-07T18:50:17Z)
Fuzzy Clustering with Similarity Queries [56.96625809888241]
The fuzzy or soft objective is a popular generalization of the well-known $k$-means problem. We show that by making few queries, the problem becomes easier to solve.
arXiv Detail & Related papers (2021-06-04T02:32:26Z)
Clustering of Big Data with Mixed Features [3.3504365823045044]
We develop a new clustering algorithm for large data of mixed type. The algorithm is capable of detecting outliers and clusters of relatively lower density values. We present experimental results to verify that our algorithm works well in practice.
arXiv Detail & Related papers (2020-11-11T19:54:38Z)
Differentially Private Clustering: Tight Approximation Ratios [57.89473217052714]
We give efficient differentially private algorithms for basic clustering problems. Our results imply an improved algorithm for the Sample and Aggregate privacy framework. One of the tools used in our 1-Cluster algorithm can be employed to get a faster quantum algorithm for ClosestPair in a moderate number of dimensions.
arXiv Detail & Related papers (2020-08-18T16:22:06Z)
Second-Order Guarantees in Centralized, Federated and Decentralized Nonconvex Optimization [64.26238893241322]
Simple algorithms have been shown to lead to good empirical results in many contexts. Several works have pursued rigorous analytical justification for studying non optimization problems. A key insight in these analyses is that perturbations play a critical role in allowing local descent algorithms.
arXiv Detail & Related papers (2020-03-31T16:54:22Z)
Simple and Scalable Sparse k-means Clustering via Feature Ranking [14.839931533868176]
We propose a novel framework for sparse k-means clustering that is intuitive, simple to implement, and competitive with state-of-the-art algorithms. Our core method readily generalizes to several task-specific algorithms such as clustering on subsets of attributes and in partially observed data settings.
arXiv Detail & Related papers (2020-02-20T02:41:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.