On Efficient Low Distortion Ultrametric Embedding
- URL: http://arxiv.org/abs/2008.06700v1
- Date: Sat, 15 Aug 2020 11:06:45 GMT
- Title: On Efficient Low Distortion Ultrametric Embedding
- Authors: Vincent Cohen-Addad, Karthik C. S., and Guillaume Lagarde
- Abstract summary: A widely-used method to preserve the underlying hierarchical structure of the data is to find an embedding of the data into a tree or an ultrametric.
In this paper, we provide a new algorithm which takes as input a set of points isometric in $mathbbRd2 (for universal constant $rho>1$) to output an ultrametric $Delta.
We show that the output of our algorithm is comparable to the output of the linkage algorithms while achieving a much faster running time.
- Score: 18.227854382422112
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A classic problem in unsupervised learning and data analysis is to find
simpler and easy-to-visualize representations of the data that preserve its
essential properties. A widely-used method to preserve the underlying
hierarchical structure of the data while reducing its complexity is to find an
embedding of the data into a tree or an ultrametric. The most popular
algorithms for this task are the classic linkage algorithms (single, average,
or complete). However, these methods on a data set of $n$ points in
$\Omega(\log n)$ dimensions exhibit a quite prohibitive running time of
$\Theta(n^2)$.
In this paper, we provide a new algorithm which takes as input a set of
points $P$ in $\mathbb{R}^d$, and for every $c\ge 1$, runs in time
$n^{1+\frac{\rho}{c^2}}$ (for some universal constant $\rho>1$) to output an
ultrametric $\Delta$ such that for any two points $u,v$ in $P$, we have
$\Delta(u,v)$ is within a multiplicative factor of $5c$ to the distance between
$u$ and $v$ in the "best" ultrametric representation of $P$. Here, the best
ultrametric is the ultrametric $\tilde\Delta$ that minimizes the maximum
distance distortion with respect to the $\ell_2$ distance, namely that
minimizes $\underset{u,v \in P}{\max}\ \frac{\tilde\Delta(u,v)}{\|u-v\|_2}$.
We complement the above result by showing that under popular complexity
theoretic assumptions, for every constant $\varepsilon>0$, no algorithm with
running time $n^{2-\varepsilon}$ can distinguish between inputs in
$\ell_\infty$-metric that admit isometric embedding and those that incur a
distortion of $\frac{3}{2}$.
Finally, we present empirical evaluation on classic machine learning datasets
and show that the output of our algorithm is comparable to the output of the
linkage algorithms while achieving a much faster running time.
Related papers
- Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms [50.15964512954274]
We study the problem of residual error estimation for matrix and vector norms using a linear sketch.
We demonstrate that this gives a substantial advantage empirically, for roughly the same sketch size and accuracy as in previous work.
We also show an $Omega(k2/pn1-2/p)$ lower bound for the sparse recovery problem, which is tight up to a $mathrmpoly(log n)$ factor.
arXiv Detail & Related papers (2024-08-16T02:33:07Z) - A Scalable Algorithm for Individually Fair K-means Clustering [77.93955971520549]
We present a scalable algorithm for the individually fair ($p$, $k$)-clustering problem introduced by Jung et al. and Mahabadi et al.
A clustering is then called individually fair if it has centers within distance $delta(x)$ of $x$ for each $xin P$.
We show empirically that not only is our algorithm much faster than prior work, but it also produces lower-cost solutions.
arXiv Detail & Related papers (2024-02-09T19:01:48Z) - A Whole New Ball Game: A Primal Accelerated Method for Matrix Games and
Minimizing the Maximum of Smooth Functions [44.655316553524855]
We design algorithms for minimizing $max_iin[n] f_i(x) over a $d$-dimensional Euclidean or simplex domain.
When each $f_i$ is $1$-Lipschitz and $1$-smooth, our method computes an $epsilon-approximate solution.
arXiv Detail & Related papers (2023-11-17T22:07:18Z) - Online Robust Mean Estimation [37.746091744197656]
We study the problem of high-dimensional robust mean estimation in an online setting.
We prove two main results about online robust mean estimation in this model.
arXiv Detail & Related papers (2023-10-24T15:28:43Z) - Dynamic algorithms for k-center on graphs [3.568439282784197]
We give the first efficient algorithms for the $k$-center problem on dynamic graphs undergoing edge updates.
We show a reduction that leads to a fully dynamic $(2+epsilon)$-approximation algorithm for the $k$-center problem.
arXiv Detail & Related papers (2023-07-28T13:50:57Z) - A faster and simpler algorithm for learning shallow networks [10.595936992218856]
We revisit the well-studied problem of learning a linear combination of $k$ ReLU activations given labeled examples.
Here we show that a simpler one-stage version of their algorithm suffices, and moreover its runtime is only $(d/varepsilon)O(k2)$.
arXiv Detail & Related papers (2023-07-24T03:04:10Z) - Sketching Algorithms and Lower Bounds for Ridge Regression [65.0720777731368]
We give a sketching-based iterative algorithm that computes $1+varepsilon$ approximate solutions for the ridge regression problem.
We also show that this algorithm can be used to give faster algorithms for kernel ridge regression.
arXiv Detail & Related papers (2022-04-13T22:18:47Z) - Finding the KT partition of a weighted graph in near-linear time [1.572727650614088]
Kawarabayashi and Thorup gave a near-trivial time deterministic algorithm for minimum cut in a simple graph $G = (V,E)$.
We give a near-linear time randomized algorithm to find the $(1+varepsilon)$-KT partition of a weighted graph.
arXiv Detail & Related papers (2021-11-02T05:26:10Z) - Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model.
We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z) - Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample
Complexity [59.34067736545355]
Given an MDP with $S$ states, $A$ actions, the discount factor $gamma in (0,1)$, and an approximation threshold $epsilon > 0$, we provide a model-free algorithm to learn an $epsilon$-optimal policy.
For small enough $epsilon$, we show an improved algorithm with sample complexity.
arXiv Detail & Related papers (2020-06-06T13:34:41Z) - Agnostic Q-learning with Function Approximation in Deterministic
Systems: Tight Bounds on Approximation Error and Sample Complexity [94.37110094442136]
We study the problem of agnostic $Q$-learning with function approximation in deterministic systems.
We show that if $delta = Oleft(rho/sqrtdim_Eright)$, then one can find the optimal policy using $Oleft(dim_Eright)$.
arXiv Detail & Related papers (2020-02-17T18:41:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.