Related papers: Sparse Partitioning Around Medoids

Sparse Partitioning Around Medoids

URL: http://arxiv.org/abs/2309.02557v1
Date: Tue, 5 Sep 2023 19:52:24 GMT
Title: Sparse Partitioning Around Medoids
Authors: Lars Lenssen and Erich Schubert
Abstract summary: Partitioning Around Medoids (PAM, k-Medoids) is a popular clustering technique to use with arbitrary distance functions or similarities. FastPAM recently introduced a speedup for large k to make it applicable for larger problems, but the method still has a runtime quadratic in N. We discuss a sparse and asymmetric variant of this problem, to be used for example on graph data such as road networks.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Partitioning Around Medoids (PAM, k-Medoids) is a popular clustering technique to use with arbitrary distance functions or similarities, where each cluster is represented by its most central object, called the medoid or the discrete median. In operations research, this family of problems is also known as facility location problem (FLP). FastPAM recently introduced a speedup for large k to make it applicable for larger problems, but the method still has a runtime quadratic in N. In this chapter, we discuss a sparse and asymmetric variant of this problem, to be used for example on graph data such as road networks. By exploiting sparsity, we can avoid the quadratic runtime and memory requirements, and make this method scalable to even larger problems, as long as we are able to build a small enough graph of sufficient connectivity to perform local optimization. Furthermore, we consider asymmetric cases, where the set of medoids is not identical to the set of points to be covered (or in the interpretation of facility location, where the possible facility locations are not identical to the consumer locations). Because of sparsity, it may be impossible to cover all points with just k medoids for too small k, which would render the problem unsolvable, and this breaks common heuristics for finding a good starting condition. We, hence, consider determining k as a part of the optimization problem and propose to first construct a greedy initial solution with a larger k, then to optimize the problem by alternating between PAM-style "swap" operations where the result is improved by replacing medoids with better alternatives and "remove" operations to reduce the number of k until neither allows further improving the result quality. We demonstrate the usefulness of this method on a problem from electrical engineering, with the input graph derived from cartographic data.

Related papers

A Greedy Strategy for Graph Cut [95.2841574410968]
We propose a greedy strategy to solve the problem of Graph Cut, called GGC. It starts from the state where each data sample is regarded as a cluster and dynamically merges the two clusters. GGC has a nearly linear computational complexity with respect to the number of samples.
arXiv Detail & Related papers (2024-12-28T05:49:42Z)
A cutting plane algorithm for globally solving low dimensional k-means clustering problems [4.5594982923247995]
We consider the k-means problem for instances with low dimensional data and formulate it as a structured concave assignment problem. This allows us to exploit the low dimensional structure and solve the problem to global optimality within reasonable time. The paper combines methods from global optimization theory to accelerate the procedure, and we provide numerical results.
arXiv Detail & Related papers (2024-02-21T07:55:33Z)
Globally solving the Gromov-Wasserstein problem for point clouds in low dimensional Euclidean spaces [5.534626267734822]
This paper presents a framework for computing the Gromov-Wasserstein problem between two sets of points in low dimensional spaces. It can be used to quantify the similarity between two formations or shapes, a common problem in AI and machine learning.
arXiv Detail & Related papers (2023-07-18T08:20:56Z)
Decentralized Riemannian natural gradient methods with Kronecker-product approximations [11.263837420265594]
We present an efficient decentralized natural gradient descent (DRNGD) method for solving decentralized manifold optimization problems. By performing the communications over the Kronecker factors, a high-quality approximation of the RFIM can be obtained in a low cost.
arXiv Detail & Related papers (2023-03-16T19:36:31Z)
Regularization and Optimization in Model-Based Clustering [4.096453902709292]
k-means algorithm variants essentially fit a mixture of identical spherical Gaussians to data that vastly deviates from such a distribution. We develop more effective optimization algorithms for general GMMs, and we combine these algorithms with regularization strategies that avoid overfitting. These results shed new light on the current status quo between GMM and k-means methods and suggest the more frequent use of general GMMs for data exploration.
arXiv Detail & Related papers (2023-02-05T18:22:29Z)
Global Optimization for Cardinality-constrained Minimum Sum-of-Squares Clustering via Semidefinite Programming [1.3053649021965603]
The minimum sum-of-squares clustering (MSSC) has been recently extended to exploit prior knowledge on the cardinality of each cluster. We propose a global optimization approach based on the branch-and-cut technique to solve the cardinality-constrained MSSC. For the upper bound, instead, we present a local search procedure that exploits the solution of the SDP relaxation solved at each node.
arXiv Detail & Related papers (2022-09-19T10:19:06Z)
STORM+: Fully Adaptive SGD with Momentum for Nonconvex Optimization [74.1615979057429]
We investigate non-batch optimization problems where the objective is an expectation over smooth loss functions. Our work builds on the STORM algorithm, in conjunction with a novel approach to adaptively set the learning rate and momentum parameters.
arXiv Detail & Related papers (2021-11-01T15:43:36Z)
Robust Training in High Dimensions via Block Coordinate Geometric Median Descent [69.47594803719333]
Geometric median (textGm) is a classical method in statistics for achieving a robust estimation of the uncorrupted data. In this paper, we show that by that by applying textscGm to only a chosen block of coordinates at a time, one can retain a breakdown point of 0.5 judiciously for smooth nontext problems.
arXiv Detail & Related papers (2021-06-16T15:55:50Z)
Offline Model-Based Optimization via Normalized Maximum Likelihood Estimation [101.22379613810881]
We consider data-driven optimization problems where one must maximize a function given only queries at a fixed set of points. This problem setting emerges in many domains where function evaluation is a complex and expensive process. We propose a tractable approximation that allows us to scale our method to high-capacity neural network models.
arXiv Detail & Related papers (2021-02-16T06:04:27Z)
Making Affine Correspondences Work in Camera Geometry Computation [62.7633180470428]
Local features provide region-to-region rather than point-to-point correspondences. We propose guidelines for effective use of region-to-region matches in the course of a full model estimation pipeline. Experiments show that affine solvers can achieve accuracy comparable to point-based solvers at faster run-times.
arXiv Detail & Related papers (2020-07-20T12:07:48Z)
Aligning Partially Overlapping Point Sets: an Inner Approximation Algorithm [80.15123031136564]
We propose a robust method to align point sets where there is no prior information about the value of the transformation. Our algorithm does not need regularization on transformation, and thus can handle the situation where there is no prior information about the values of the transformations. Experimental results demonstrate the better robustness of the proposed method over state-of-the-art algorithms.
arXiv Detail & Related papers (2020-07-05T15:23:33Z)
Second-Order Guarantees in Centralized, Federated and Decentralized Nonconvex Optimization [64.26238893241322]
Simple algorithms have been shown to lead to good empirical results in many contexts. Several works have pursued rigorous analytical justification for studying non optimization problems. A key insight in these analyses is that perturbations play a critical role in allowing local descent algorithms.
arXiv Detail & Related papers (2020-03-31T16:54:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.