Sparse Partitioning Around Medoids
- URL: http://arxiv.org/abs/2309.02557v1
- Date: Tue, 5 Sep 2023 19:52:24 GMT
- Title: Sparse Partitioning Around Medoids
- Authors: Lars Lenssen and Erich Schubert
- Abstract summary: Partitioning Around Medoids (PAM, k-Medoids) is a popular clustering technique to use with arbitrary distance functions or similarities.
FastPAM recently introduced a speedup for large k to make it applicable for larger problems, but the method still has a runtime quadratic in N.
We discuss a sparse and asymmetric variant of this problem, to be used for example on graph data such as road networks.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Partitioning Around Medoids (PAM, k-Medoids) is a popular clustering
technique to use with arbitrary distance functions or similarities, where each
cluster is represented by its most central object, called the medoid or the
discrete median. In operations research, this family of problems is also known
as facility location problem (FLP). FastPAM recently introduced a speedup for
large k to make it applicable for larger problems, but the method still has a
runtime quadratic in N. In this chapter, we discuss a sparse and asymmetric
variant of this problem, to be used for example on graph data such as road
networks. By exploiting sparsity, we can avoid the quadratic runtime and memory
requirements, and make this method scalable to even larger problems, as long as
we are able to build a small enough graph of sufficient connectivity to perform
local optimization. Furthermore, we consider asymmetric cases, where the set of
medoids is not identical to the set of points to be covered (or in the
interpretation of facility location, where the possible facility locations are
not identical to the consumer locations). Because of sparsity, it may be
impossible to cover all points with just k medoids for too small k, which would
render the problem unsolvable, and this breaks common heuristics for finding a
good starting condition. We, hence, consider determining k as a part of the
optimization problem and propose to first construct a greedy initial solution
with a larger k, then to optimize the problem by alternating between PAM-style
"swap" operations where the result is improved by replacing medoids with better
alternatives and "remove" operations to reduce the number of k until neither
allows further improving the result quality. We demonstrate the usefulness of
this method on a problem from electrical engineering, with the input graph
derived from cartographic data.
Related papers
- A cutting plane algorithm for globally solving low dimensional k-means
clustering problems [4.5594982923247995]
We consider the k-means problem for instances with low dimensional data and formulate it as a structured concave assignment problem.
This allows us to exploit the low dimensional structure and solve the problem to global optimality within reasonable time.
The paper combines methods from global optimization theory to accelerate the procedure, and we provide numerical results.
arXiv Detail & Related papers (2024-02-21T07:55:33Z) - Globally solving the Gromov-Wasserstein problem for point clouds in low
dimensional Euclidean spaces [5.534626267734822]
This paper presents a framework for computing the Gromov-Wasserstein problem between two sets of points in low dimensional spaces.
It can be used to quantify the similarity between two formations or shapes, a common problem in AI and machine learning.
arXiv Detail & Related papers (2023-07-18T08:20:56Z) - Decentralized Riemannian natural gradient methods with Kronecker-product
approximations [11.263837420265594]
We present an efficient decentralized natural gradient descent (DRNGD) method for solving decentralized manifold optimization problems.
By performing the communications over the Kronecker factors, a high-quality approximation of the RFIM can be obtained in a low cost.
arXiv Detail & Related papers (2023-03-16T19:36:31Z) - Regularization and Optimization in Model-Based Clustering [4.096453902709292]
k-means algorithm variants essentially fit a mixture of identical spherical Gaussians to data that vastly deviates from such a distribution.
We develop more effective optimization algorithms for general GMMs, and we combine these algorithms with regularization strategies that avoid overfitting.
These results shed new light on the current status quo between GMM and k-means methods and suggest the more frequent use of general GMMs for data exploration.
arXiv Detail & Related papers (2023-02-05T18:22:29Z) - Global Optimization for Cardinality-constrained Minimum Sum-of-Squares
Clustering via Semidefinite Programming [1.3053649021965603]
The minimum sum-of-squares clustering (MSSC) has been recently extended to exploit prior knowledge on the cardinality of each cluster.
We propose a global optimization approach based on the branch-and-cut technique to solve the cardinality-constrained MSSC.
For the upper bound, instead, we present a local search procedure that exploits the solution of the SDP relaxation solved at each node.
arXiv Detail & Related papers (2022-09-19T10:19:06Z) - STORM+: Fully Adaptive SGD with Momentum for Nonconvex Optimization [74.1615979057429]
We investigate non-batch optimization problems where the objective is an expectation over smooth loss functions.
Our work builds on the STORM algorithm, in conjunction with a novel approach to adaptively set the learning rate and momentum parameters.
arXiv Detail & Related papers (2021-11-01T15:43:36Z) - Robust Training in High Dimensions via Block Coordinate Geometric Median
Descent [69.47594803719333]
Geometric median (textGm) is a classical method in statistics for achieving a robust estimation of the uncorrupted data.
In this paper, we show that by that by applying textscGm to only a chosen block of coordinates at a time, one can retain a breakdown point of 0.5 judiciously for smooth nontext problems.
arXiv Detail & Related papers (2021-06-16T15:55:50Z) - Offline Model-Based Optimization via Normalized Maximum Likelihood
Estimation [101.22379613810881]
We consider data-driven optimization problems where one must maximize a function given only queries at a fixed set of points.
This problem setting emerges in many domains where function evaluation is a complex and expensive process.
We propose a tractable approximation that allows us to scale our method to high-capacity neural network models.
arXiv Detail & Related papers (2021-02-16T06:04:27Z) - Making Affine Correspondences Work in Camera Geometry Computation [62.7633180470428]
Local features provide region-to-region rather than point-to-point correspondences.
We propose guidelines for effective use of region-to-region matches in the course of a full model estimation pipeline.
Experiments show that affine solvers can achieve accuracy comparable to point-based solvers at faster run-times.
arXiv Detail & Related papers (2020-07-20T12:07:48Z) - Aligning Partially Overlapping Point Sets: an Inner Approximation
Algorithm [80.15123031136564]
We propose a robust method to align point sets where there is no prior information about the value of the transformation.
Our algorithm does not need regularization on transformation, and thus can handle the situation where there is no prior information about the values of the transformations.
Experimental results demonstrate the better robustness of the proposed method over state-of-the-art algorithms.
arXiv Detail & Related papers (2020-07-05T15:23:33Z) - Second-Order Guarantees in Centralized, Federated and Decentralized
Nonconvex Optimization [64.26238893241322]
Simple algorithms have been shown to lead to good empirical results in many contexts.
Several works have pursued rigorous analytical justification for studying non optimization problems.
A key insight in these analyses is that perturbations play a critical role in allowing local descent algorithms.
arXiv Detail & Related papers (2020-03-31T16:54:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.