A Broader View on Clustering under Cluster-Aware Norm Objectives
- URL: http://arxiv.org/abs/2512.06211v2
- Date: Tue, 09 Dec 2025 10:18:53 GMT
- Title: A Broader View on Clustering under Cluster-Aware Norm Objectives
- Authors: Martin G. Herold, Evangelos Kipouridis, Joachim Spoerhase,
- Abstract summary: We revisit the $(f,g)$-clustering problem that we introduced in a recent work.<n>This problem assigns each of the $k$ clusters a cost determined by the monotone, symmetric norm $f$ applied to the vector distances in the cluster.<n>We provide an $O(k)$-approximation for the general $(f,g)$-clustering problem.
- Score: 0.8016305126057338
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We revisit the $(f,g)$-clustering problem that we introduced in a recent work [SODA'25], and which subsumes fundamental clustering problems such as $k$-Center, $k$-Median, Min-Sum of Radii, and Min-Load $k$-Clustering. This problem assigns each of the $k$ clusters a cost determined by the monotone, symmetric norm $f$ applied to the vector distances in the cluster, and aims at minimizing the norm $g$ applied to the vector of cluster costs. Previously, we focused on certain special cases for which we designed constant-factor approximation algorithms. Our bounds for more general settings left, however, large gaps to the known bounds for the basic problems they capture. In this work, we provide a clearer picture of the approximability of these more general settings. First, we design an $O(\log^2 n)$-approximation algorithm for $(f, L_{1})$-clustering for any $f$. This improves upon our previous $\widetilde{O}(\sqrt{n})$-approximation. Second, we provide an $O(k)$-approximation for the general $(f,g)$-clustering problem, which improves upon our previous $\widetilde{O}(\sqrt{kn})$-approximation algorithm and matches the best-known upper bound for Min-Load $k$-Clustering. We then design an approximation algorithm for $(f,g)$-clustering that interpolates, up to polylog factors, between the best known bounds for $k$-Center, $k$-Median, Min-Sum of Radii, Min-Load $k$-Clustering, (Top, $L_{1}$)-clustering, and $(L_{\infty},g)$-clustering based on a newly defined parameter of $f$ and $g$.
Related papers
- Near-Optimal Clustering in Mixture of Markov Chains [74.3828414695655]
We study the problem of clustering $T$ trajectories of length $H$, each generated by one of $K$ unknown ergodic Markov chains over a finite state space of size $S$.<n>We derive an instance-dependent, high-probability lower bound on the clustering error rate, governed by the weighted KL divergence between the transition kernels of the chains.<n>We then present a novel two-stage clustering algorithm.
arXiv Detail & Related papers (2025-06-02T05:10:40Z) - Generalization Performance of Ensemble Clustering: From Theory to Algorithm [57.176040163699554]
This paper focuses on generalization error, excess risk and consistency in ensemble clustering.<n>By assigning varying weights to finite clusterings, we minimize the error between the empirical average clusterings and their expectation.<n>We instantiate our theory to develop a new ensemble clustering algorithm.
arXiv Detail & Related papers (2025-06-01T09:34:52Z) - Clustering to Minimize Cluster-Aware Norm Objectives [0.3481985817302898]
We seek to partition a given set $P$ of data points into $k$ clusters by finding a set $X$ of $k$ centers.
The cost of a cluster, represented by a center $xin X$, is a monotone, symmetric norm $f$ (inner norm) of the vector of distances of points assigned to $x$.
The goal is to minimize a norm $g$ (outer norm) of the vector of cluster costs.
arXiv Detail & Related papers (2024-10-31T16:33:40Z) - A Scalable Algorithm for Individually Fair K-means Clustering [77.93955971520549]
We present a scalable algorithm for the individually fair ($p$, $k$)-clustering problem introduced by Jung et al. and Mahabadi et al.
A clustering is then called individually fair if it has centers within distance $delta(x)$ of $x$ for each $xin P$.
We show empirically that not only is our algorithm much faster than prior work, but it also produces lower-cost solutions.
arXiv Detail & Related papers (2024-02-09T19:01:48Z) - Sketching Algorithms for Sparse Dictionary Learning: PTAS and Turnstile
Streaming [48.18845814885398]
We develop new techniques to extend the applicability of sketching-based approaches to sparse dictionary learning and the Euclidean $k$-means clustering problems.
On the fast algorithms front, we obtain a new approach for designing PTAS's for the $k$-means clustering problem.
On the streaming algorithms front, we obtain new upper bounds and lower bounds for dictionary learning and $k$-means clustering.
arXiv Detail & Related papers (2023-10-29T16:46:26Z) - Simple, Scalable and Effective Clustering via One-Dimensional
Projections [10.807367640692021]
Clustering is a fundamental problem in unsupervised machine learning with many applications in data analysis.
We introduce a simple randomized clustering algorithm that provably runs in expected time $O(mathrmnnz(X) + nlog n)$ for arbitrary $k$.
We prove that our algorithm achieves approximation ratio $smashwidetildeO(k4)$ on any input dataset for the $k$-means objective.
arXiv Detail & Related papers (2023-10-25T16:37:45Z) - Do you know what q-means? [42.96240569413475]
We present a classical $varepsilon$-$k$-means algorithm that performs an approximate version of one iteration of Lloyd's algorithm with time complexity.<n>We also propose an improved $q$-means quantum algorithm with time complexity.
arXiv Detail & Related papers (2023-08-18T17:52:12Z) - Differentially Private Clustering in Data Streams [56.26040303056582]
We provide the first differentially private algorithms for $k$-means and $k$-median clustering of $d$-dimensional Euclidean data points over a stream with length at most $T$.<n>Our main technical contribution is a differentially private clustering framework for data streams which only requires an offline DP coreset or clustering algorithm as a blackbox.
arXiv Detail & Related papers (2023-07-14T16:11:22Z) - Near-Optimal Quantum Coreset Construction Algorithms for Clustering [15.513270929560088]
We give quantum algorithms that find coresets for $k$-clustering in $mathbbRd$ with $tildeO(sqrtnkd3/2)$ query complexity.
Our coreset reduces the input size from $n$ to $mathrmpoly(kepsilon-1d)$, so that existing $alpha$-approximation algorithms for clustering can run on top of it.
arXiv Detail & Related papers (2023-06-05T12:22:46Z) - Approximating Fair Clustering with Cascaded Norm Objectives [10.69111036810888]
We find a clustering which minimizes the $ell_q$-norm of the vector over $W$ of the $ell_p$-norms of the weighted distances of points in $P$ from the centers.
This generalizes various clustering problems, including Socially Fair $k$-Median and $k$-Means.
arXiv Detail & Related papers (2021-11-08T20:18:10Z) - Nearly-Tight and Oblivious Algorithms for Explainable Clustering [8.071379672971542]
We study the problem of explainable clustering in the setting first formalized by Moshkovitz, Dasgupta, Rashtchian, and Frost (ICML 2020)
A $k$-clustering is said to be explainable if it is given by a decision tree where each internal node data points with a threshold cut in a single dimension (feature)
We give an algorithm that outputs an explainable clustering that loses at most a factor of $O(log2 k)$ compared to an optimal (not necessarily explainable) clustering for the $k$-medians objective.
arXiv Detail & Related papers (2021-06-30T15:49:41Z) - Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model.
We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.