Fugu-MT 論文翻訳(概要): Differentially Private Clustering in Data Streams

論文の概要: Differentially Private Clustering in Data Streams

arxiv url: http://arxiv.org/abs/2307.07449v2
Date: Mon, 8 Jan 2024 02:32:23 GMT
ステータス: 翻訳完了
システム内更新日: 2024-01-09 23:39:35.859284
Title: Differentially Private Clustering in Data Streams
Title（参考訳）: データストリームにおける異なるプライベートクラスタリング
Authors: Alessandro Epasto, Tamalika Mukherjee, Peilin Zhong
Abstract要約: オフラインのDPコアセットやクラスタリングアルゴリズムをブラックボックスとしてのみ必要とする,差分プライベートなストリーミングクラスタリングフレームワークを提案する。我々のフレームワークはまた、連続的なリリース設定の下で微分プライベートであり、すなわち、全てのタイムスタンプにおけるアルゴリズムの出力の和は常に微分プライベートである。
参考スコア（独自算出の注目度）: 65.78882209673885
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The streaming model is an abstraction of computing over massive data streams, which is a popular way of dealing with large-scale modern data analysis. In this model, there is a stream of data points, one after the other. A streaming algorithm is only allowed one pass over the data stream, and the goal is to perform some analysis during the stream while using as small space as possible. Clustering problems (such as $k$-means and $k$-median) are fundamental unsupervised machine learning primitives, and streaming clustering algorithms have been extensively studied in the past. However, since data privacy becomes a central concern in many real-world applications, non-private clustering algorithms are not applicable in many scenarios. In this work, we provide the first differentially private streaming algorithms for $k$-means and $k$-median clustering of $d$-dimensional Euclidean data points over a stream with length at most $T$ using $poly(k,d,\log(T))$ space to achieve a constant multiplicative error and a $poly(k,d,\log(T))$ additive error. In particular, we present a differentially private streaming clustering framework which only requires an offline DP coreset or clustering algorithm as a blackbox. By plugging in existing results from DP clustering Ghazi, Kumar, Manurangsi 2020 and Kaplan, Stemmer 2018, we achieve (1) a $(1+\gamma)$-multiplicative approximation with $\tilde{O}_\gamma(poly(k,d,\log(T)))$ space for any $\gamma>0$, and the additive error is $poly(k,d,\log(T))$ or (2) an $O(1)$-multiplicative approximation with $\tilde{O}(k^{1.5} \cdot poly(d,\log(T)))$ space and $poly(k,d,\log(T))$ additive error. In addition, our algorithmic framework is also differentially private under the continual release setting, i.e., the union of outputs of our algorithms at every timestamp is always differentially private.
Abstract（参考訳）: ストリーミングモデルは大規模データストリーム上のコンピューティングの抽象化であり、大規模データ分析を扱う一般的な方法である。このモデルでは、データポイントのストリームが次々に存在します。ストリーミングアルゴリズムは、データストリームをパスする唯一の方法であり、可能な限り小さなスペースを使用して、ストリーム中にいくつかの分析を行うことが目標である。クラスタリング問題($k$-meansや$k$-medianなど)は基本的な教師なし機械学習プリミティブであり、ストリーミングクラスタリングアルゴリズムは過去に広く研究されてきた。しかし、データプライバシが多くの現実世界アプリケーションにおいて中心的な関心事になっているため、プライベートでないクラスタリングアルゴリズムは多くのシナリオでは適用できない。本研究では,$k$-means と $k$-median に対する最初の微分的プライベートなストリーミングアルゴリズムを提供する。$k$-means と $k$-median による,$d$-dimensional euclidean データポイントを最大$t$ のストリーム上にクラスタリングし,定数乗算誤差と $poly(k,d,\log(t))$ 加算誤差を達成するために $poly(k,d)$ を用いた。特に,オフラインDPコアセットやクラスタリングアルゴリズムをブラックボックスとしてのみ必要とする,差分プライベートなストリーミングクラスタリングフレームワークを提案する。 DPクラスタリング Ghazi, Kumar, Manurangsi 2020 と Kaplan, Stemmer 2018 の既存の結果をプラグインすることで、(1) a $(1+\gamma)$-multiplicative approximation with $\tilde{O}_\gamma(poly(k,d,\log(T)))$ space for any $\gamma>0$, and the additive error is $poly(k,d,\log(T))$ or (2) a $O(1)$-multiplicative approximation with $\tilde{O}(k^{1.5} \cdot poly(d,\log(T)))$ space and $poly(k,d,\log(T))$ additive error。さらに、我々のアルゴリズムフレームワークは、連続的なリリース設定の下で微分プライベートであり、すなわち、各タイムスタンプにおけるアルゴリズムの出力の統一は常に微分プライベートである。

論文の概要: Differentially Private Clustering in Data Streams

関連論文リスト