Coresets for Data Discretization and Sine Wave Fitting
- URL: http://arxiv.org/abs/2203.03009v1
- Date: Sun, 6 Mar 2022 17:07:56 GMT
- Title: Coresets for Data Discretization and Sine Wave Fitting
- Authors: Alaa Maalouf and Murad Tukan and Eric Price and Daniel Kane and Dan
Feldman
- Abstract summary: A stream of integers in $[N]:=1,cdots,N$ is received from a sensor.
The goal is to approximate the $n$ points received so far in $P$ by a single frequency.
We prove that emphevery set $P$ of $n$ has a weighted subset $Ssubseteq P$.
- Score: 39.18104905459739
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the \emph{monitoring} problem, the input is an unbounded stream
$P={p_1,p_2\cdots}$ of integers in $[N]:=\{1,\cdots,N\}$, that are obtained
from a sensor (such as GPS or heart beats of a human). The goal (e.g., for
anomaly detection) is to approximate the $n$ points received so far in $P$ by a
single frequency $\sin$, e.g. $\min_{c\in C}cost(P,c)+\lambda(c)$, where
$cost(P,c)=\sum_{i=1}^n \sin^2(\frac{2\pi}{N} p_ic)$, $C\subseteq [N]$ is a
feasible set of solutions, and $\lambda$ is a given regularization function.
For any approximation error $\varepsilon>0$, we prove that \emph{every} set $P$
of $n$ integers has a weighted subset $S\subseteq P$ (sometimes called
core-set) of cardinality $|S|\in O(\log(N)^{O(1)})$ that approximates
$cost(P,c)$ (for every $c\in [N]$) up to a multiplicative factor of
$1\pm\varepsilon$. Using known coreset techniques, this implies streaming
algorithms using only $O((\log(N)\log(n))^{O(1)})$ memory. Our results hold for
a large family of functions. Experimental results and open source code are
provided.
Related papers
- Computational Hardness of Private Coreset [84.99100741615423]
For a given input set of points, a coreset is another set of points such that the $k$-means objective for any candidate solution is preserved up to a multiplicative $(, 1/n(1))$ factor (and some additive factor)<n>We show that no-time $(, 1/n(1))$-DP algorithm can compute a coreset for $k$-means in the $ell_infty$-metric for some constant $> 0$ (and some constant additive factor)<n>For $k$-means in the
arXiv Detail & Related papers (2026-02-19T15:58:49Z) - Deterministic Hardness of Approximation of Unique-SVP and GapSVP in $\ell_p$ norms for $p>2$ [0.0]
We establish deterministic hardness of approximation results for the Shortest Vector Problem in $ell_p$ norm ($mathsfSVP_p$) and for Unique-SVP ($mathsfuSVP_p$)<n>For every $p > 2$, we prove constant-ratio hardness: no-time algorithm approximates $mathsfSVP_p$ or $mathsfuSVP_p$ within a ratio of $sqrt2 - o(1)$.
arXiv Detail & Related papers (2025-10-19T20:17:26Z) - Actively Learning Halfspaces without Synthetic Data [34.777547976926456]
We design efficient algorithms for learning halfspaces without point synthesis.<n>As a corollary, we obtain an optimal $O(d + log n)$ query deterministic learner for axis-aligned halfspaces.<n>Our algorithm solves the more general problem of learning a Boolean function $f$ over $n$ elements which is monotone under at least one of $D$ provided orderings.
arXiv Detail & Related papers (2025-09-25T07:39:25Z) - Approximating the operator norm of local Hamiltonians via few quantum states [53.16156504455106]
Consider a Hermitian operator $A$ acting on a complex Hilbert space of $2n$.<n>We show that when $A$ has small degree in the Pauli expansion, or in other words, $A$ is a local $n$-qubit Hamiltonian.<n>We show that whenever $A$ is $d$-local, textiti.e., $deg(A)le d$, we have the following discretization-type inequality.
arXiv Detail & Related papers (2025-09-15T14:26:11Z) - Sandwich test for Quantum Phase Estimation [0.0]
Quantum Phase Estimation (QPE) has potential for a scientific revolution through numerous practical applications.<n>Many QPE algorithms use the Hadamard test to estimate $langle psi|Uk|psirangle$ for a large integer $k$.<n>We present a new algorithm, the SANDWICH test to address this bottleneck.
arXiv Detail & Related papers (2025-07-31T16:45:07Z) - Sharp Noisy Binary Search with Monotonic Probabilities [5.563988395126509]
We revisit the noisy binary search model of Karp and Kleinberg.
We produce an algorithm that succeeds with probability $1-delta$ from [ frac1C_tau, varepsilon cdot left(lg n + O(log2/3 n log 1/3 frac1delta + log frac1delta) right.
arXiv Detail & Related papers (2023-11-01T20:45:13Z) - Fast $(1+\varepsilon)$-Approximation Algorithms for Binary Matrix
Factorization [54.29685789885059]
We introduce efficient $(1+varepsilon)$-approximation algorithms for the binary matrix factorization (BMF) problem.
The goal is to approximate $mathbfA$ as a product of low-rank factors.
Our techniques generalize to other common variants of the BMF problem.
arXiv Detail & Related papers (2023-06-02T18:55:27Z) - A spectral least-squares-type method for heavy-tailed corrupted
regression with unknown covariance \& heterogeneous noise [2.019622939313173]
We revisit heavy-tailed corrupted least-squares linear regression assuming to have a corrupted $n$-sized label-feature sample of at most $epsilon n$ arbitrary outliers.
We propose a near-optimal computationally tractable estimator, based on the power method, assuming no knowledge on $(Sigma,Xi) nor the operator norm of $Xi$.
arXiv Detail & Related papers (2022-09-06T23:37:31Z) - Low-Rank Approximation with $1/\epsilon^{1/3}$ Matrix-Vector Products [58.05771390012827]
We study iterative methods based on Krylov subspaces for low-rank approximation under any Schatten-$p$ norm.
Our main result is an algorithm that uses only $tildeO(k/sqrtepsilon)$ matrix-vector products.
arXiv Detail & Related papers (2022-02-10T16:10:41Z) - Learning low-degree functions from a logarithmic number of random
queries [77.34726150561087]
We prove that for any integer $ninmathbbN$, $din1,ldots,n$ and any $varepsilon,deltain(0,1)$, a bounded function $f:-1,1nto[-1,1]$ of degree at most $d$ can be learned.
arXiv Detail & Related papers (2021-09-21T13:19:04Z) - Faster Rates of Differentially Private Stochastic Convex Optimization [7.93728520583825]
We study the case where the population risk function satisfies the Tysbakov Noise Condition (TNC) with some parameter $theta>1$.
In the second part, we focus on a special case where the population risk function is strongly convex.
arXiv Detail & Related papers (2021-07-31T22:23:39Z) - Optimal Coreset for Gaussian Kernel Density Estimation [0.8376091455761259]
Given a point set $Psubset mathbbRd$, the kernel density estimate of $P$ is defined as [ overlinemathcalG_P(x) = frac1left|Pright|sum_pin Pe-leftlVert x-p rightrVert2 ] for any $xinmathbbRd$.
We study how to construct a small subset $Q$ of $P
arXiv Detail & Related papers (2020-07-15T22:58:50Z) - Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model.
We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z) - Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample
Complexity [59.34067736545355]
Given an MDP with $S$ states, $A$ actions, the discount factor $gamma in (0,1)$, and an approximation threshold $epsilon > 0$, we provide a model-free algorithm to learn an $epsilon$-optimal policy.
For small enough $epsilon$, we show an improved algorithm with sample complexity.
arXiv Detail & Related papers (2020-06-06T13:34:41Z) - Sets Clustering [25.358415142404752]
We prove that a core-set of $O(logn)$ sets always exists, and can be computed in $O(nlogn)$ time.
Applying an inefficient but optimal algorithm on this coreset allows us to obtain the first PTAS ($1+varepsilon$ approximation) for the sets-$k$-means problem.
Open source code and experimental results for document classification and facility locations are also provided.
arXiv Detail & Related papers (2020-03-09T13:30:30Z) - On the Complexity of Minimizing Convex Finite Sums Without Using the
Indices of the Individual Functions [62.01594253618911]
We exploit the finite noise structure of finite sums to derive a matching $O(n2)$-upper bound under the global oracle model.
Following a similar approach, we propose a novel adaptation of SVRG which is both emphcompatible with oracles, and achieves complexity bounds of $tildeO(n2+nsqrtL/mu)log (1/epsilon)$ and $O(nsqrtL/epsilon)$, for $mu>0$ and $mu=0$
arXiv Detail & Related papers (2020-02-09T03:39:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.