Related papers: Clustering to Minimize Cluster-Aware Norm Objectives

Clustering to Minimize Cluster-Aware Norm Objectives

URL: http://arxiv.org/abs/2410.24104v1
Date: Thu, 31 Oct 2024 16:33:40 GMT
Title: Clustering to Minimize Cluster-Aware Norm Objectives
Authors: Martin G. Herold, Evangelos Kipouridis, Joachim Spoerhase,
Abstract summary: We seek to partition a given set $P$ of data points into $k$ clusters by finding a set $X$ of $k$ centers. The cost of a cluster, represented by a center $xin X$, is a monotone, symmetric norm $f$ (inner norm) of the vector of distances of points assigned to $x$. The goal is to minimize a norm $g$ (outer norm) of the vector of cluster costs.
Score: 0.3481985817302898
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We initiate the study of the following general clustering problem. We seek to partition a given set $P$ of data points into $k$ clusters by finding a set $X$ of $k$ centers and assigning each data point to one of the centers. The cost of a cluster, represented by a center $x\in X$, is a monotone, symmetric norm $f$ (inner norm) of the vector of distances of points assigned to $x$. The goal is to minimize a norm $g$ (outer norm) of the vector of cluster costs. This problem, which we call $(f,g)$-Clustering, generalizes many fundamental clustering problems such as $k$-Center, $k$-Median , Min-Sum of Radii, and Min-Load $k$-Clustering . A recent line of research (Chakrabarty, Swamy [STOC'19]) studies norm objectives that are oblivious to the cluster structure such as $k$-Median and $k$-Center. In contrast, our problem models cluster-aware objectives including Min-Sum of Radii and Min-Load $k$-Clustering. Our main results are as follows. First, we design a constant-factor approximation algorithm for $(\textsf{top}_\ell,\mathcal{L}_1)$-Clustering where the inner norm ($\textsf{top}_\ell$) sums over the $\ell$ largest distances. Second, we design a constant-factor approximation\ for $(\mathcal{L}_\infty,\textsf{Ord})$-Clustering where the outer norm is a convex combination of $\textsf{top}_\ell$ norms (ordered weighted norm).

Related papers

Near-Optimal Clustering in Mixture of Markov Chains [74.3828414695655]
We study the problem of clustering $T$ trajectories of length $H$, each generated by one of $K$ unknown ergodic Markov chains over a finite state space of size $S$.<n>We derive an instance-dependent, high-probability lower bound on the clustering error rate, governed by the weighted KL divergence between the transition kernels of the chains.<n>We then present a novel two-stage clustering algorithm.
arXiv Detail & Related papers (2025-06-02T05:10:40Z)
Generalization Performance of Ensemble Clustering: From Theory to Algorithm [57.176040163699554]
This paper focuses on generalization error, excess risk and consistency in ensemble clustering.<n>By assigning varying weights to finite clusterings, we minimize the error between the empirical average clusterings and their expectation.<n>We instantiate our theory to develop a new ensemble clustering algorithm.
arXiv Detail & Related papers (2025-06-01T09:34:52Z)
A Scalable Algorithm for Individually Fair K-means Clustering [77.93955971520549]
We present a scalable algorithm for the individually fair ($p$, $k$)-clustering problem introduced by Jung et al. and Mahabadi et al. A clustering is then called individually fair if it has centers within distance $delta(x)$ of $x$ for each $xin P$. We show empirically that not only is our algorithm much faster than prior work, but it also produces lower-cost solutions.
arXiv Detail & Related papers (2024-02-09T19:01:48Z)
A Unified Framework for Center-based Clustering of Distributed Data [46.86543102499174]
We develop a family of distributed center-based clustering algorithms that work over networks of users. Our framework allows for a broad class of smooth convex loss functions, including popular clustering losses like $K$-means and Huber loss. For the special case of Bregman losses, we show that our fixed points converge to the set of Lloyd points.
arXiv Detail & Related papers (2024-02-02T10:44:42Z)
On Generalization Bounds for Projective Clustering [3.4490841399255823]
Given a set of points, clustering consists of finding a partition of a point set into $k$ clusters such that the center to which a point is assigned is as close as possible. For center-based objectives, we show a convergence rate of $tildeOleft(sqrtfrackj2nright)$. For subspace clustering with $j$-dimensional subspaces, we show a convergence rate of $tildeOleft(sqrtfrackj2nright)$.
arXiv Detail & Related papers (2023-10-13T14:15:54Z)
Do you know what q-means? [50.045011844765185]
Clustering is one of the most important tools for analysis of large datasets. We present an improved version of the "$q$-means" algorithm for clustering. We also present a "dequantized" algorithm for $varepsilon which runs in $Obig(frack2varepsilon2(sqrtkd + log(Nd))big.
arXiv Detail & Related papers (2023-08-18T17:52:12Z)
Approximation Algorithms for Fair Range Clustering [14.380145034918158]
This paper studies the fair range clustering problem in which the data points are from different demographic groups. The goal is to pick $k$ centers with the minimum clustering cost such that each group is at least minimally represented in the centers set. In particular, the fair range $ell_p$-clustering captures fair range $k$-center, $k$-median and $k$-means as its special cases.
arXiv Detail & Related papers (2023-06-11T21:18:40Z)
Approximating Fair Clustering with Cascaded Norm Objectives [10.69111036810888]
We find a clustering which minimizes the $ell_q$-norm of the vector over $W$ of the $ell_p$-norms of the weighted distances of points in $P$ from the centers. This generalizes various clustering problems, including Socially Fair $k$-Median and $k$-Means.
arXiv Detail & Related papers (2021-11-08T20:18:10Z)
Nearly-Tight and Oblivious Algorithms for Explainable Clustering [8.071379672971542]
We study the problem of explainable clustering in the setting first formalized by Moshkovitz, Dasgupta, Rashtchian, and Frost (ICML 2020) A $k$-clustering is said to be explainable if it is given by a decision tree where each internal node data points with a threshold cut in a single dimension (feature) We give an algorithm that outputs an explainable clustering that loses at most a factor of $O(log2 k)$ compared to an optimal (not necessarily explainable) clustering for the $k$-medians objective.
arXiv Detail & Related papers (2021-06-30T15:49:41Z)
Clustering Mixture Models in Almost-Linear Time via List-Decodable Mean Estimation [58.24280149662003]
We study the problem of list-decodable mean estimation, where an adversary can corrupt a majority of the dataset. We develop new algorithms for list-decodable mean estimation, achieving nearly-optimal statistical guarantees.
arXiv Detail & Related papers (2021-06-16T03:34:14Z)
Fuzzy Clustering with Similarity Queries [56.96625809888241]
The fuzzy or soft objective is a popular generalization of the well-known $k$-means problem. We show that by making few queries, the problem becomes easier to solve.
arXiv Detail & Related papers (2021-06-04T02:32:26Z)
Sum-of-norms clustering does not separate nearby balls [49.1574468325115]
We show a continuous version of sum-of-norms clustering, where the dataset is replaced by a general measure. We state and prove a local-global characterization of the clustering that seems to be new even in the case of discrete datapoints.
arXiv Detail & Related papers (2021-04-28T13:35:17Z)
K-expectiles clustering [0.0]
We propose a novel partitioning clustering algorithm based on expectiles. We suggest two schemes: fixed $tau$ clustering, and adaptive $tau$ clustering.
arXiv Detail & Related papers (2021-03-16T21:14:56Z)
Computationally efficient sparse clustering [67.95910835079825]
We provide a finite sample analysis of a new clustering algorithm based on PCA. We show that it achieves the minimax optimal misclustering rate in the regime $|theta infty$.
arXiv Detail & Related papers (2020-05-21T17:51:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.