Related papers: Dynamic Algorithm for Explainable k-medians Clustering under lp Norm

Dynamic Algorithm for Explainable k-medians Clustering under lp Norm

URL: http://arxiv.org/abs/2512.01150v1
Date: Mon, 01 Dec 2025 00:01:47 GMT
Title: Dynamic Algorithm for Explainable k-medians Clustering under lp Norm
Authors: Konstantin Makarychev, Ilias Papanikolaou, Liren Shan,
Abstract summary: We present the first algorithm for explainable k-medians under lp norm for every finite p >= 1.<n>Our algorithm achieves an O(p(log k)1 + 1/p - 1/p2) approximation to the optimal k-medians cost for any p >= 1.<n>The algorithm maintains an explainable clustering under a sequence of insertions and deletions, with amortized update time O(d log3 k) and O(log k) recourse.
Score: 11.05906005268085
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We study the problem of explainable k-medians clustering introduced by Dasgupta, Frost, Moshkovitz, and Rashtchian (2020). In this problem, the goal is to construct a threshold decision tree that partitions data into k clusters while minimizing the k-medians objective. These trees are interpretable because each internal node makes a simple decision by thresholding a single feature, allowing users to trace and understand how each point is assigned to a cluster. We present the first algorithm for explainable k-medians under lp norm for every finite p >= 1. Our algorithm achieves an O(p(log k)^{1 + 1/p - 1/p^2}) approximation to the optimal k-medians cost for any p >= 1. Previously, algorithms were known only for p = 1 and p = 2. For p = 2, our algorithm improves upon the existing bound of O(log^{3/2}k), and for p = 1, it matches the tight bound of log k + O(1) up to a multiplicative O(log log k) factor. We show how to implement our algorithm in a dynamic setting. The dynamic algorithm maintains an explainable clustering under a sequence of insertions and deletions, with amortized update time O(d log^3 k) and O(log k) recourse, making it suitable for large-scale and evolving datasets.

Related papers

A Greedy Strategy for Graph Cut [95.2841574410968]
We propose a greedy strategy to solve the problem of Graph Cut, called GGC.<n>It starts from the state where each data sample is regarded as a cluster and dynamically merges the two clusters.<n>GGC has a nearly linear computational complexity with respect to the number of samples.
arXiv Detail & Related papers (2024-12-28T05:49:42Z)
Accelerating k-Means Clustering with Cover Trees [0.30693357740321775]
We propose a new k-means algorithm based on the cover tree index, that has relatively low overhead and performs well. We obtain a hybrid algorithm that combines the benefits of tree aggregation and bounds-based filtering.
arXiv Detail & Related papers (2024-10-19T14:02:42Z)
Differentially-Private Hierarchical Clustering with Provable Approximation Guarantees [79.59010418610625]
We study differentially private approximation algorithms for hierarchical clustering. We show strong lower bounds for the problem: that any $epsilon$-DP algorithm must exhibit $O(|V|2/ epsilon)$-additive error for an input dataset. We propose a private $1+o(1)$ approximation algorithm which also recovers the blocks exactly.
arXiv Detail & Related papers (2023-01-31T19:14:30Z)
Explicit Second-Order Min-Max Optimization: Practical Algorithms and Complexity Analysis [71.05708939639537]
We propose and analyze several inexact regularized Newton-type methods for finding a global saddle point of emphconcave unconstrained problems.<n>Our method improves the existing line-search-based min-max optimization by shaving off an $O(loglog(1/eps)$ factor in the required number of Schur decompositions.
arXiv Detail & Related papers (2022-10-23T21:24:37Z)
Solving correlation clustering with QAOA and a Rydberg qudit system: a full-stack approach [94.37521840642141]
We study the correlation clustering problem using the quantum approximate optimization algorithm (QAOA) and qudits. Specifically, we consider a neutral atom quantum computer and propose a full stack approach for correlation clustering. We show the qudit implementation is superior to the qubit encoding as quantified by the gate count.
arXiv Detail & Related papers (2021-06-22T11:07:38Z)
Hierarchical Agglomerative Graph Clustering in Nearly-Linear Time [1.5644420658691407]
We study the widely used hierarchical agglomerative clustering (HAC) algorithm on edge-weighted graphs. We define an algorithmic framework for hierarchical agglomerative graph clustering. We show that our approach can speed up clustering of point datasets by a factor of 20.7--76.5x.
arXiv Detail & Related papers (2021-06-10T09:29:05Z)
Towards Optimally Efficient Tree Search with Deep Learning [76.64632985696237]
This paper investigates the classical integer least-squares problem which estimates signals integer from linear models. The problem is NP-hard and often arises in diverse applications such as signal processing, bioinformatics, communications and machine learning. We propose a general hyper-accelerated tree search (HATS) algorithm by employing a deep neural network to estimate the optimal estimation for the underlying simplified memory-bounded A* algorithm.
arXiv Detail & Related papers (2021-01-07T08:00:02Z)
Efficient Permutation Discovery in Causal DAGs [9.22466799504763]
We introduce an efficient algorithm for finding sparse permutations of a directed acyclic graph. We show that our method with depth $w$ runs in $O(pw+3)$ time. We also compare our algorithm to provably consistent causal structure learning algorithms, such as the PC algorithm, GES, and GSP, and show that our method achieves comparable performance with a shorter runtime.
arXiv Detail & Related papers (2020-11-06T21:56:41Z)
Differentially Private Clustering: Tight Approximation Ratios [57.89473217052714]
We give efficient differentially private algorithms for basic clustering problems. Our results imply an improved algorithm for the Sample and Aggregate privacy framework. One of the tools used in our 1-Cluster algorithm can be employed to get a faster quantum algorithm for ClosestPair in a moderate number of dimensions.
arXiv Detail & Related papers (2020-08-18T16:22:06Z)
Relational Algorithms for k-means Clustering [17.552485682328772]
This paper gives a k-means approximation algorithm that is efficient in the relational algorithms model. The running time is potentially exponentially smaller than $N$, the number of data points to be clustered that the relational database represents.
arXiv Detail & Related papers (2020-08-01T23:21:40Z)
Second-order Conditional Gradient Sliding [70.88478428882871]
We present the emphSecond-Order Conditional Gradient Sliding (SOCGS) algorithm.<n>The SOCGS algorithm converges quadratically in primal gap after a finite number of linearly convergent iterations.<n>It is useful when the feasible region can only be accessed efficiently through a linear optimization oracle.
arXiv Detail & Related papers (2020-02-20T17:52:18Z)
k-means++: few more steps yield constant approximation [3.7468898363447654]
The k-means++ algorithm of Arthur and Vassilvitskii (SODA 2007) is a state-of-the-art algorithm for solving the k-means clustering problem. Recently, Lattanzi and Sohler (ICML) proposed augmenting k-means++ with O(k log k) local search steps to yield a constant approximation (in expectation) to the k-means clustering problem.
arXiv Detail & Related papers (2020-02-18T18:28:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.