$k$-Median Clustering via Metric Embedding: Towards Better
Initialization with Differential Privacy
- URL: http://arxiv.org/abs/2206.12895v1
- Date: Sun, 26 Jun 2022 14:58:36 GMT
- Title: $k$-Median Clustering via Metric Embedding: Towards Better
Initialization with Differential Privacy
- Authors: Chenglin Fan, Ping Li, Xiaoyun Li
- Abstract summary: We develop a new HST scheme, for the $k$-median problem in the general metric space.
From the tree, we propose a novel and efficient search algorithm, for good initial centers.
Our approach can also be extended to the $k$-means problem.
- Score: 31.963659189956367
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: When designing clustering algorithms, the choice of initial centers is
crucial for the quality of the learned clusters. In this paper, we develop a
new initialization scheme, called HST initialization, for the $k$-median
problem in the general metric space (e.g., discrete space induced by graphs),
based on the construction of metric embedding tree structure of the data. From
the tree, we propose a novel and efficient search algorithm, for good initial
centers that can be used subsequently for the local search algorithm. Our
proposed HST initialization can produce initial centers achieving lower errors
than those from another popular initialization method, $k$-median++, with
comparable efficiency. The HST initialization can also be extended to the
setting of differential privacy (DP) to generate private initial centers. We
show that the error from applying DP local search followed by our private HST
initialization improves previous results on the approximation error, and
approaches the lower bound within a small factor. Experiments justify the
theory and demonstrate the effectiveness of our proposed method. Our approach
can also be extended to the $k$-means problem.
Related papers
- Unsupervised Learning of Initialization in Deep Neural Networks via
Maximum Mean Discrepancy [74.34895342081407]
We propose an unsupervised algorithm to find good initialization for input data.
We first notice that each parameter configuration in the parameter space corresponds to one particular downstream task of d-way classification.
We then conjecture that the success of learning is directly related to how diverse downstream tasks are in the vicinity of the initial parameters.
arXiv Detail & Related papers (2023-02-08T23:23:28Z) - Alternating minimization algorithm with initialization analysis for
r-local and k-sparse unlabeled sensing [10.751269349279912]
Unlabeled sensing problem is to recover an unknown signal from permuted linear measurements.
We propose an alternating minimization algorithm with a suitable initialization for the widely considered k-sparse permutation model.
Our algorithm is computationally scalable and, compared to baseline methods, achieves superior performance on real and synthetic datasets.
arXiv Detail & Related papers (2022-11-14T18:44:55Z) - An enhanced method of initial cluster center selection for K-means
algorithm [0.0]
We propose a novel approach to improve initial cluster selection for K-means algorithm.
The Convex Hull algorithm facilitates the computing of the first two centroids and the remaining ones are selected according to the distance from previously selected centers.
We obtained only 7.33%, 7.90%, and 0% clustering error in Iris, Letter, and Ruspini data respectively.
arXiv Detail & Related papers (2022-10-18T00:58:50Z) - Gradient Based Clustering [72.15857783681658]
We propose a general approach for distance based clustering, using the gradient of the cost function that measures clustering quality.
The approach is an iterative two step procedure (alternating between cluster assignment and cluster center updates) and is applicable to a wide range of functions.
arXiv Detail & Related papers (2022-02-01T19:31:15Z) - Differentially Private Clustering: Tight Approximation Ratios [57.89473217052714]
We give efficient differentially private algorithms for basic clustering problems.
Our results imply an improved algorithm for the Sample and Aggregate privacy framework.
One of the tools used in our 1-Cluster algorithm can be employed to get a faster quantum algorithm for ClosestPair in a moderate number of dimensions.
arXiv Detail & Related papers (2020-08-18T16:22:06Z) - Scalable Initialization Methods for Large-Scale Clustering [0.0]
Two new methods for K-means clustering are proposed.
The proposed methods are scalable and can be run in parallel.
Experiments show that the proposed methods compare favorably to the state-of-the-art.
arXiv Detail & Related papers (2020-07-23T11:29:53Z) - Learnable Subspace Clustering [76.2352740039615]
We develop a learnable subspace clustering paradigm to efficiently solve the large-scale subspace clustering problem.
The key idea is to learn a parametric function to partition the high-dimensional subspaces into their underlying low-dimensional subspaces.
To the best of our knowledge, this paper is the first work to efficiently cluster millions of data points among the subspace clustering methods.
arXiv Detail & Related papers (2020-04-09T12:53:28Z) - Optimal Randomized First-Order Methods for Least-Squares Problems [56.05635751529922]
This class of algorithms encompasses several randomized methods among the fastest solvers for least-squares problems.
We focus on two classical embeddings, namely, Gaussian projections and subsampled Hadamard transforms.
Our resulting algorithm yields the best complexity known for solving least-squares problems with no condition number dependence.
arXiv Detail & Related papers (2020-02-21T17:45:32Z) - A novel initialisation based on hospital-resident assignment for the
k-modes algorithm [0.0]
This paper presents a new way of selecting an initial solution for the k-modes algorithm.
It allows for a notion of mathematical fairness and a leverage of the data that the common initialisations from literature do not.
arXiv Detail & Related papers (2020-02-07T10:20:49Z) - MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks.
The use of gradient combined nonvolutionity renders learning susceptible to novel problems.
We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.