Related papers: Dynamic Similarity Graph Construction with Kernel Density Estimation

Dynamic Similarity Graph Construction with Kernel Density Estimation

URL: http://arxiv.org/abs/2507.01696v1
Date: Wed, 02 Jul 2025 13:25:22 GMT
Title: Dynamic Similarity Graph Construction with Kernel Density Estimation
Authors: Steinar Laenen, Peter Macgregor, He Sun,
Abstract summary: In the kernel density estimation (KDE) problem, we are given a set $X$ of data points in $mathbbRd$, a kernel function $k: mathbbRd times mathbbRd rightarrow mathbbR$, and a query point $mathbfq in mathbbRd$.<n>The objective is to quickly output an estimate of $sum_mathbfx in X k(mathbf
Score: 12.147773956173122
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In the kernel density estimation (KDE) problem, we are given a set $X$ of data points in $\mathbb{R}^d$, a kernel function $k: \mathbb{R}^d \times \mathbb{R}^d \rightarrow \mathbb{R}$, and a query point $\mathbf{q} \in \mathbb{R}^d$, and the objective is to quickly output an estimate of $\sum_{\mathbf{x} \in X} k(\mathbf{q}, \mathbf{x})$. In this paper, we consider $\textsf{KDE}$ in the dynamic setting, and introduce a data structure that efficiently maintains the estimates for a set of query points as data points are added to $X$ over time. Based on this, we design a dynamic data structure that maintains a sparse approximation of the fully connected similarity graph on $X$, and develop a fast dynamic spectral clustering algorithm. We further evaluate the effectiveness of our algorithms on both synthetic and real-world datasets.

Related papers

Statistical-Computational Trade-offs for Density Estimation [60.81548752871115]
We show that for a broad class of data structures their bounds cannot be significantly improved. This is a novel emphstatistical-computational trade-off for density estimation.
arXiv Detail & Related papers (2024-10-30T15:03:33Z)
Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit [75.4661041626338]
We study the problem of gradient descent learning of a single-index target function $f_*(boldsymbolx) = textstylesigma_*left(langleboldsymbolx,boldsymbolthetarangleright)$<n>We prove that a two-layer neural network optimized by an SGD-based algorithm learns $f_*$ with a complexity that is not governed by information exponents.
arXiv Detail & Related papers (2024-06-03T17:56:58Z)
Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models. In this work, we initiate the study of provably learning a multi-head attention layer from random examples. We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z)
Compressive Recovery of Sparse Precision Matrices [5.557600489035657]
We consider the problem of learning a graph modeling the statistical relations of the $d$ variables from a dataset with $n$ samples $X in mathbbRn times d$. We show that it is possible to estimate it from a sketch of size $m=Omegaleft((d+2k)log(d)right)$ where $k$ is the maximal number of edges of the underlying graph. We investigate the possibility of achieving practical recovery with an iterative algorithm based on the graphical lasso, viewed as a specific denoiser.
arXiv Detail & Related papers (2023-11-08T13:29:08Z)
Fast Approximation of Similarity Graphs with Kernel Density Estimation [12.321755440062732]
We present a new algorithm for constructing a similarity graph from a set $X$ of data points in $mathbbRd$. Our presented algorithm is based on the kernel density estimation problem, and is applicable for arbitrary kernel functions.
arXiv Detail & Related papers (2023-10-21T00:32:47Z)
Do you know what q-means? [45.810803542748495]
We present an improved version of the quantum algorithm originally proposed by Kerenidis, Landman, Luongo, and Prakash (NeurIPS')<n>Our algorithm does not rely on quantum linear algebra primitives of prior work, but instead only uses QRAM to prepare simple states.<n>We also present a "dequantized" algorithm for $varepsilon$-$k$-means which runs in $Obig.
arXiv Detail & Related papers (2023-08-18T17:52:12Z)
A Normal Map-Based Proximal Stochastic Gradient Method: Convergence and Identification Properties [7.281869462071603]
The proximal gradient method (PSGD) is one of the state-of-the-art approaches for composite-type problems.<n>In this paper, we present a simple variant of PSGD based on Robinsons map.
arXiv Detail & Related papers (2023-05-10T01:12:11Z)
Correlation Clustering Algorithm for Dynamic Complete Signed Graphs: An Index-based Approach [9.13755431537592]
In this paper, we reduce the complexity of approximating the correlation clustering problem from $O(mtimesleft( 2+ alpha (G) right)+n)$ to $O(m+n)$ for any given value of $varepsilon$ for a complete signed graph. Our approach gives the same output as the original algorithm and makes it possible to implement the algorithm in a full dynamic setting.
arXiv Detail & Related papers (2023-01-01T10:57:36Z)
Adaptive and Dynamic Multi-Resolution Hashing for Pairwise Summations [19.602149096819776]
We propose Adam-Hash: an adaptive and dynamic multi-resolution hashing data-structure for fast pairwise summation estimation. Our proposed Adam-Hash is also robust to adaptive PSE queries, where an adversary can choose query $q_j in mathbbRd$ depending on the output from previous queries.
arXiv Detail & Related papers (2022-12-21T23:23:24Z)
Learning a Single Neuron with Adversarial Label Noise via Gradient Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations. The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z)
Fast Graph Sampling for Short Video Summarization using Gershgorin Disc Alignment [52.577757919003844]
We study the problem of efficiently summarizing a short video into several paragraphs, leveraging recent progress in fast graph sampling. Experimental results show that our algorithm achieves comparable video summarization as state-of-the-art methods, at a substantially reduced complexity.
arXiv Detail & Related papers (2021-10-21T18:43:00Z)
Random matrices in service of ML footprint: ternary random features with no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$. We propose a novel random technique, called Ternary Random Feature (TRF) The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z)
Non-Parametric Estimation of Manifolds from Noisy Data [1.0152838128195467]
We consider the problem of estimating a $d$ dimensional sub-manifold of $mathbbRD$ from a finite set of noisy samples. We show that the estimation yields rates of convergence of $n-frack2k + d$ for the point estimation and $n-frack-12k + d$ for the estimation of tangent space.
arXiv Detail & Related papers (2021-05-11T02:29:33Z)
Convergence of Graph Laplacian with kNN Self-tuned Kernels [14.645468999921961]
Self-tuned kernel adaptively sets a $sigma_i$ at each point $x_i$ by the $k$-nearest neighbor (kNN) distance. This paper proves the convergence of graph Laplacian operator $L_N$ to manifold (weighted-)Laplacian for a new family of kNN self-tuned kernels.
arXiv Detail & Related papers (2020-11-03T04:55:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.