Fugu-MT 論文翻訳(概要): Sublinear Sketches for Approximate Nearest Neighbor and Kernel Density Estimation

論文の概要: Sublinear Sketches for Approximate Nearest Neighbor and Kernel Density Estimation

arxiv url: http://arxiv.org/abs/2510.23039v1
Date: Mon, 27 Oct 2025 06:05:45 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-28 15:28:15.468645
Title: Sublinear Sketches for Approximate Nearest Neighbor and Kernel Density Estimation
Title（参考訳）: 近似近辺近傍のサブリニアスケッチとカーネル密度推定
Authors: Ved Danait, Srijan Das, Sujoy Bhore,
Abstract要約: 動的データストリームに対して,ANNとA-KDEの両方に対して,サブ線形空間とクエリ時間保証を実現する新しいスケッチアルゴリズムを開発した。提案手法は,サブ線形クエリ時間,バッチクエリをサポートし,より一般的なTurnstileモデルに拡張する。 Sliding-WindowモデルにおけるA-KDEに対して、$mathcalOleft(RW cdot frac1sqrt1+epsilon - 1 log2 Nright)$のスケッチを提案する。
参考スコア（独自算出の注目度）: 14.369905129159449
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Approximate Nearest Neighbor (ANN) search and Approximate Kernel Density Estimation (A-KDE) are fundamental problems at the core of modern machine learning, with broad applications in data analysis, information systems, and large-scale decision making. In massive and dynamic data streams, a central challenge is to design compact sketches that preserve essential structural properties of the data while enabling efficient queries. In this work, we develop new sketching algorithms that achieve sublinear space and query time guarantees for both ANN and A-KDE for a dynamic stream of data. For ANN in the streaming model, under natural assumptions, we design a sublinear sketch that requires only $\mathcal{O}(n^{1+\rho-\eta})$ memory by storing only a sublinear ($n^{-\eta}$) fraction of the total inputs, where $\rho$ is a parameter of the LSH family, and $0<\eta<1$. Our method supports sublinear query time, batch queries, and extends to the more general Turnstile model. While earlier works have focused on Exact NN, this is the first result on ANN that achieves near-optimal trade-offs between memory size and approximation error. Next, for A-KDE in the Sliding-Window model, we propose a sketch of size $\mathcal{O}\left(RW \cdot \frac{1}{\sqrt{1+\epsilon} - 1} \log^2 N\right)$, where $R$ is the number of sketch rows, $W$ is the LSH range, $N$ is the window size, and $\epsilon$ is the approximation error. This, to the best of our knowledge, is the first theoretical sublinear sketch guarantee for A-KDE in the Sliding-Window model. We complement our theoretical results with experiments on various real-world datasets, which show that the proposed sketches are lightweight and achieve consistently low error in practice.
Abstract（参考訳）: Approximate Nearest Neighbor (ANN) Search and Approximate Kernel Density Estimation (A-KDE)は、データ分析、情報システム、大規模意思決定など、現代の機械学習のコアにおける基本的な問題である。大規模でダイナミックなデータストリームでは、効率的なクエリを可能にしながら、データの本質的な構造的特性を保持するコンパクトなスケッチを設計することが中心的な課題である。本研究では,データストリームの動的ストリームに対して,ANNとA-KDEの両方に対して,サブ線形空間とクエリ時間保証を実現するための新しいスケッチアルゴリズムを開発する。ストリーミングモデルにおいて、ANNは、自然な仮定の下で、全入力のサブリニア(n^{-\eta}$)分だけを格納することで、$\mathcal{O}(n^{1+\rho-\eta})$メモリのみを必要とするサブリニアスケッチを設計する。提案手法は,サブ線形クエリ時間,バッチクエリをサポートし,より一般的なTurnstileモデルに拡張する。以前の研究はExact NNに重点を置いていたが、これはメモリサイズと近似誤差のほぼ最適トレードオフを実現するANNの最初の結果である。次に、Sliding-WindowモデルにおけるA-KDEに対して、$\mathcal{O}\left(RW \cdot \frac{1}{\sqrt{1+\epsilon} - 1} \log^2N\right)$, where $R$ is the number of sketch rows, $W$ is the LSH range, $N$ is the window size, $\epsilon$ is the approximation error。我々の知る限り、これはスライディング・ウィンドウモデルにおけるA-KDEに対する最初の理論的サブ線形スケッチ保証である。提案手法は,提案したスケッチが軽量であり,実際の誤差が一貫して低いことを示す。

論文の概要: Sublinear Sketches for Approximate Nearest Neighbor and Kernel Density Estimation

関連論文リスト