Dynamic Similarity Graph Construction with Kernel Density Estimation
- URL: http://arxiv.org/abs/2507.01696v1
- Date: Wed, 02 Jul 2025 13:25:22 GMT
- Title: Dynamic Similarity Graph Construction with Kernel Density Estimation
- Authors: Steinar Laenen, Peter Macgregor, He Sun,
- Abstract summary: In the kernel density estimation (KDE) problem, we are given a set $X$ of data points in $mathbbRd$, a kernel function $k: mathbbRd times mathbbRd rightarrow mathbbR$, and a query point $mathbfq in mathbbRd$.<n>The objective is to quickly output an estimate of $sum_mathbfx in X k(mathbf
- Score: 12.147773956173122
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the kernel density estimation (KDE) problem, we are given a set $X$ of data points in $\mathbb{R}^d$, a kernel function $k: \mathbb{R}^d \times \mathbb{R}^d \rightarrow \mathbb{R}$, and a query point $\mathbf{q} \in \mathbb{R}^d$, and the objective is to quickly output an estimate of $\sum_{\mathbf{x} \in X} k(\mathbf{q}, \mathbf{x})$. In this paper, we consider $\textsf{KDE}$ in the dynamic setting, and introduce a data structure that efficiently maintains the estimates for a set of query points as data points are added to $X$ over time. Based on this, we design a dynamic data structure that maintains a sparse approximation of the fully connected similarity graph on $X$, and develop a fast dynamic spectral clustering algorithm. We further evaluate the effectiveness of our algorithms on both synthetic and real-world datasets.
Related papers
- Statistical-Computational Trade-offs for Density Estimation [60.81548752871115]
We show that for a broad class of data structures their bounds cannot be significantly improved.
This is a novel emphstatistical-computational trade-off for density estimation.
arXiv Detail & Related papers (2024-10-30T15:03:33Z) - Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit [75.4661041626338]
We study the problem of gradient descent learning of a single-index target function $f_*(boldsymbolx) = textstylesigma_*left(langleboldsymbolx,boldsymbolthetarangleright)$<n>We prove that a two-layer neural network optimized by an SGD-based algorithm learns $f_*$ with a complexity that is not governed by information exponents.
arXiv Detail & Related papers (2024-06-03T17:56:58Z) - Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models.
In this work, we initiate the study of provably learning a multi-head attention layer from random examples.
We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z) - Compressive Recovery of Sparse Precision Matrices [5.557600489035657]
We consider the problem of learning a graph modeling the statistical relations of the $d$ variables from a dataset with $n$ samples $X in mathbbRn times d$.
We show that it is possible to estimate it from a sketch of size $m=Omegaleft((d+2k)log(d)right)$ where $k$ is the maximal number of edges of the underlying graph.
We investigate the possibility of achieving practical recovery with an iterative algorithm based on the graphical lasso, viewed as a specific denoiser.
arXiv Detail & Related papers (2023-11-08T13:29:08Z) - Fast Approximation of Similarity Graphs with Kernel Density Estimation [12.321755440062732]
We present a new algorithm for constructing a similarity graph from a set $X$ of data points in $mathbbRd$.
Our presented algorithm is based on the kernel density estimation problem, and is applicable for arbitrary kernel functions.
arXiv Detail & Related papers (2023-10-21T00:32:47Z) - Do you know what q-means? [45.810803542748495]
We present an improved version of the quantum algorithm originally proposed by Kerenidis, Landman, Luongo, and Prakash (NeurIPS')<n>Our algorithm does not rely on quantum linear algebra primitives of prior work, but instead only uses QRAM to prepare simple states.<n>We also present a "dequantized" algorithm for $varepsilon$-$k$-means which runs in $Obig.
arXiv Detail & Related papers (2023-08-18T17:52:12Z) - A Normal Map-Based Proximal Stochastic Gradient Method: Convergence and Identification Properties [7.281869462071603]
The proximal gradient method (PSGD) is one of the state-of-the-art approaches for composite-type problems.<n>In this paper, we present a simple variant of PSGD based on Robinsons map.
arXiv Detail & Related papers (2023-05-10T01:12:11Z) - Correlation Clustering Algorithm for Dynamic Complete Signed Graphs: An
Index-based Approach [9.13755431537592]
In this paper, we reduce the complexity of approximating the correlation clustering problem from $O(mtimesleft( 2+ alpha (G) right)+n)$ to $O(m+n)$ for any given value of $varepsilon$ for a complete signed graph.
Our approach gives the same output as the original algorithm and makes it possible to implement the algorithm in a full dynamic setting.
arXiv Detail & Related papers (2023-01-01T10:57:36Z) - Adaptive and Dynamic Multi-Resolution Hashing for Pairwise Summations [19.602149096819776]
We propose Adam-Hash: an adaptive and dynamic multi-resolution hashing data-structure for fast pairwise summation estimation.
Our proposed Adam-Hash is also robust to adaptive PSE queries, where an adversary can choose query $q_j in mathbbRd$ depending on the output from previous queries.
arXiv Detail & Related papers (2022-12-21T23:23:24Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Fast Graph Sampling for Short Video Summarization using Gershgorin Disc
Alignment [52.577757919003844]
We study the problem of efficiently summarizing a short video into several paragraphs, leveraging recent progress in fast graph sampling.
Experimental results show that our algorithm achieves comparable video summarization as state-of-the-art methods, at a substantially reduced complexity.
arXiv Detail & Related papers (2021-10-21T18:43:00Z) - Random matrices in service of ML footprint: ternary random features with
no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$.
We propose a novel random technique, called Ternary Random Feature (TRF)
The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z) - Non-Parametric Estimation of Manifolds from Noisy Data [1.0152838128195467]
We consider the problem of estimating a $d$ dimensional sub-manifold of $mathbbRD$ from a finite set of noisy samples.
We show that the estimation yields rates of convergence of $n-frack2k + d$ for the point estimation and $n-frack-12k + d$ for the estimation of tangent space.
arXiv Detail & Related papers (2021-05-11T02:29:33Z) - Convergence of Graph Laplacian with kNN Self-tuned Kernels [14.645468999921961]
Self-tuned kernel adaptively sets a $sigma_i$ at each point $x_i$ by the $k$-nearest neighbor (kNN) distance.
This paper proves the convergence of graph Laplacian operator $L_N$ to manifold (weighted-)Laplacian for a new family of kNN self-tuned kernels.
arXiv Detail & Related papers (2020-11-03T04:55:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.