Fugu-MT 論文翻訳(概要): Fast and effective algorithms for fair clustering at scale

論文の概要: Fast and effective algorithms for fair clustering at scale

arxiv url: http://arxiv.org/abs/2605.13759v1
Date: Wed, 13 May 2026 16:40:07 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-14 23:30:28.185155
Title: Fast and effective algorithms for fair clustering at scale
Title（参考訳）: 大規模クラスタリングのための高速かつ効率的なアルゴリズム
Authors: Claudio Mantuano, Manuel Kammermann, Philipp Baumann,
Abstract要約: 保護されたグループに属するオブジェクトに対する公平なクラスタリング問題に対処する。目的は、対象物とクラスタの中心の間の2乗ユークリッド距離の和として定義されるクラスタリングコストを最小化することである。本稿では,公正クラスタリングのための一般的なフレームワークを提案し,コスト対公正トレードオフを正確に制御し,それに基づいて3つを導入する。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Clustering is an unsupervised machine learning task that consists of identifying groups of similar objects. It has numerous applications and is increasingly used in fairness-sensitive domains where objects represent individuals, such as customers, employees, or students. We address a fair clustering problem in which objects belong to protected groups. The problem consists of partitioning the objects into a predefined number of clusters while attaining a user-defined target level of fairness, meaning that each protected group is sufficiently represented in each cluster. The objective is to minimize the clustering cost, defined as the sum of squared Euclidean distances between the objects and the centers of their clusters. Since clustering cost and fairness are generally in conflict, managing the trade-off between them is essential in practical applications. Existing methods provide limited control over this trade-off and either fail to scale to large datasets or, when they scale, produce low-quality solutions. We propose a general framework for fair clustering that provides precise control over the cost-fairness trade-off and introduce three heuristics based on it. The first heuristic focuses on solution quality and the flexibility to incorporate additional constraints, the second improves scalability while retaining high solution quality, and the third is designed for maximum scalability, producing solutions for instances with millions of objects in seconds. The proposed heuristics outperform existing approaches in comprehensive numerical experiments on benchmark datasets. The source code of our heuristics and instructions for reproducing the experiments are publicly available on GitHub.
Abstract（参考訳）: クラスタリングは、類似したオブジェクトのグループを識別する、教師なしの機械学習タスクである。多数のアプリケーションがあり、顧客や従業員、あるいは学生といった個人を表すオブジェクトのフェアネスに敏感なドメインでの利用が増えている。保護されたグループに属するオブジェクトに対する公平なクラスタリング問題に対処する。問題は、オブジェクトを予め定義された数のクラスタに分割すると同時に、ユーザが定義した目標の公平度を達成し、各保護されたグループが各クラスタで十分に表現されていることを意味する。目的は、対象物とクラスタの中心の間の2乗ユークリッド距離の和として定義されるクラスタリングコストを最小化することである。クラスタリングコストと公平性は一般的に対立しているため、それらの間のトレードオフを管理することは、実践的な応用において不可欠である。既存の方法は、このトレードオフを限定的に制御し、大規模なデータセットにスケールできないか、あるいはスケールした場合、低品質のソリューションを生成する。本稿では, 公正クラスタリングのための一般的なフレームワークを提案し, コスト対空トレードオフを正確に制御し, それらに基づく3つのヒューリスティックスを導入する。第1のヒューリスティックは、追加の制約を組み込むためのソリューション品質と柔軟性に焦点を当て、第2の方法は、高いソリューション品質を維持しながらスケーラビリティを改善し、第3の方法は、最大限のスケーラビリティのために設計され、数百万のオブジェクトを数秒でインスタンス向けにソリューションを生成する。提案したヒューリスティックスは、ベンチマークデータセットの包括的な数値実験において、既存のアプローチよりも優れている。私たちのヒューリスティックスのソースコードと実験を再現するための指示はGitHubで公開されています。

論文の概要: Fast and effective algorithms for fair clustering at scale

関連論文リスト