Related papers: A Randomized Algorithm to Reduce the Support of Discrete Measures

A Randomized Algorithm to Reduce the Support of Discrete Measures

URL: http://arxiv.org/abs/2006.01757v2
Date: Thu, 26 Nov 2020 09:12:31 GMT
Title: A Randomized Algorithm to Reduce the Support of Discrete Measures
Authors: Francesco Cosentino, Harald Oberhauser, Alessandro Abate
Abstract summary: Given a discrete probability measure supported on $N$ atoms and a set of $n$ real-valued functions, there exists a probability measure that is supported on a subset of $n+1$ of the original $N$ atoms. We give a simple geometric characterization of barycenters via negative cones and derive a randomized algorithm that computes this new measure by "greedy geometric sampling" We then study its properties, and benchmark it on synthetic and real-world data to show that it can be very beneficial in the $Ngg n$ regime.
Score: 79.55586575988292
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Given a discrete probability measure supported on $N$ atoms and a set of $n$ real-valued functions, there exists a probability measure that is supported on a subset of $n+1$ of the original $N$ atoms and has the same mean when integrated against each of the $n$ functions. If $ N \gg n$ this results in a huge reduction of complexity. We give a simple geometric characterization of barycenters via negative cones and derive a randomized algorithm that computes this new measure by "greedy geometric sampling". We then study its properties, and benchmark it on synthetic and real-world data to show that it can be very beneficial in the $N\gg n$ regime. A Python implementation is available at \url{https://github.com/FraCose/Recombination_Random_Algos}.

Related papers

Inverting the Leverage Score Gradient: An Efficient Approximate Newton Method [10.742859956268655]
This paper aims to recover the intrinsic model parameters given the leverage scores gradient. We specifically scrutinize the inversion of the leverage score gradient, denoted as $g(x)$.
arXiv Detail & Related papers (2024-08-21T01:39:42Z)
Simple, Scalable and Effective Clustering via One-Dimensional Projections [10.807367640692021]
Clustering is a fundamental problem in unsupervised machine learning with many applications in data analysis. We introduce a simple randomized clustering algorithm that provably runs in expected time $O(mathrmnnz(X) + nlog n)$ for arbitrary $k$. We prove that our algorithm achieves approximation ratio $smashwidetildeO(k4)$ on any input dataset for the $k$-means objective.
arXiv Detail & Related papers (2023-10-25T16:37:45Z)
Near-Optimal Bounds for Learning Gaussian Halfspaces with Random Classification Noise [50.64137465792738]
We show that any efficient SQ algorithm for the problem requires sample complexity at least $Omega(d1/2/(maxp, epsilon)2)$. Our lower bound suggests that this quadratic dependence on $1/epsilon$ is inherent for efficient algorithms.
arXiv Detail & Related papers (2023-07-13T18:59:28Z)
Replicable Clustering [57.19013971737493]
We propose algorithms for the statistical $k$-medians, statistical $k$-means, and statistical $k$-centers problems by utilizing approximation routines for their counterparts in a black-box manner. We also provide experiments on synthetic distributions in 2D using the $k$-means++ implementation from sklearn as a black-box that validate our theoretical results.
arXiv Detail & Related papers (2023-02-20T23:29:43Z)
On the Multidimensional Random Subset Sum Problem [0.9007371440329465]
In the Random Subset Sum Problem, given $n$ i.i.d. random variables $X_1,..., X_n$, we wish to approximate any point $z in [-1,1]$ as the sum of a subset $X_i_1(z),..., X_i_s(z)$ of them, up to error $varepsilon cdot. We prove that, in $d$ dimensions, $n = O(d3log frac 1varepsilon cdot
arXiv Detail & Related papers (2022-07-28T08:10:43Z)
List-Decodable Sparse Mean Estimation via Difference-of-Pairs Filtering [42.526664955704746]
We develop a novel, conceptually simpler technique for list-decodable sparse mean estimation. In particular, for distributions with "certifiably bounded" $t-th moments in $k$-sparse directions, our algorithm achieves error of $(1/alpha)O (1/t)$ with sample complexity $m = (klog(n))O(t)/alpha(mnt)$. For the special case of Gaussian inliers, our algorithm achieves the optimal error guarantee of $Theta (sqrtlog
arXiv Detail & Related papers (2022-06-10T17:38:18Z)
Random matrices in service of ML footprint: ternary random features with no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$. We propose a novel random technique, called Ternary Random Feature (TRF) The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z)
Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model. We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z)
Sample Efficient Reinforcement Learning via Low-Rank Matrix Estimation [30.137884459159107]
We consider the question of learning $Q$-function in a sample efficient manner for reinforcement learning with continuous state and action spaces. We develop a simple, iterative learning algorithm that finds $epsilon$-Schmidt $Q$-function with sample complexity of $widetildeO(frac1epsilonmax(d1), d_2)+2)$ when the optimal $Q$-function has low rank $r$ and the factor $$ is below a certain threshold.
arXiv Detail & Related papers (2020-06-11T00:55:35Z)
Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample Complexity [59.34067736545355]
Given an MDP with $S$ states, $A$ actions, the discount factor $gamma in (0,1)$, and an approximation threshold $epsilon > 0$, we provide a model-free algorithm to learn an $epsilon$-optimal policy. For small enough $epsilon$, we show an improved algorithm with sample complexity.
arXiv Detail & Related papers (2020-06-06T13:34:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.