Coresets for Decision Trees of Signals
- URL: http://arxiv.org/abs/2110.03195v1
- Date: Thu, 7 Oct 2021 05:49:55 GMT
- Title: Coresets for Decision Trees of Signals
- Authors: Ibrahim Jubran, Ernesto Evgeniy Sanches Shayda, Ilan Newman, Dan
Feldman
- Abstract summary: We provide the first algorithm that outputs such a $(k,varepsilon)$-coreset for emphevery such matrix $D$.
This is by forging a link between decision trees from machine learning -- to partition trees in computational geometry.
- Score: 19.537354146654845
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A $k$-decision tree $t$ (or $k$-tree) is a recursive partition of a matrix
(2D-signal) into $k\geq 1$ block matrices (axis-parallel rectangles, leaves)
where each rectangle is assigned a real label. Its regression or classification
loss to a given matrix $D$ of $N$ entries (labels) is the sum of squared
differences over every label in $D$ and its assigned label by $t$. Given an
error parameter $\varepsilon\in(0,1)$, a $(k,\varepsilon)$-coreset $C$ of $D$
is a small summarization that provably approximates this loss to \emph{every}
such tree, up to a multiplicative factor of $1\pm\varepsilon$. In particular,
the optimal $k$-tree of $C$ is a $(1+\varepsilon)$-approximation to the optimal
$k$-tree of $D$.
We provide the first algorithm that outputs such a $(k,\varepsilon)$-coreset
for \emph{every} such matrix $D$. The size $|C|$ of the coreset is polynomial
in $k\log(N)/\varepsilon$, and its construction takes $O(Nk)$ time. This is by
forging a link between decision trees from machine learning -- to partition
trees in computational geometry.
Experimental results on \texttt{sklearn} and \texttt{lightGBM} show that
applying our coresets on real-world data-sets boosts the computation time of
random forests and their parameter tuning by up to x$10$, while keeping similar
accuracy. Full open source code is provided.
Related papers
- LevAttention: Time, Space, and Streaming Efficient Algorithm for Heavy Attentions [54.54897832889028]
We show that for any $K$, there is a universal set" $U subset [n]$ of size independent of $n$, such that for any $Q$ and any row $i$, the large attention scores $A_i,j$ in row $i$ of $A$ all have $jin U$.
We empirically show the benefits of our scheme for vision transformers, showing how to train new models that use our universal set while training as well.
arXiv Detail & Related papers (2024-10-07T19:47:13Z) - Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms [50.15964512954274]
We study the problem of residual error estimation for matrix and vector norms using a linear sketch.
We demonstrate that this gives a substantial advantage empirically, for roughly the same sketch size and accuracy as in previous work.
We also show an $Omega(k2/pn1-2/p)$ lower bound for the sparse recovery problem, which is tight up to a $mathrmpoly(log n)$ factor.
arXiv Detail & Related papers (2024-08-16T02:33:07Z) - Optimal Embedding Dimension for Sparse Subspace Embeddings [4.042707434058959]
A random $mtimes n$ matrix $S$ is an oblivious subspace embedding (OSE)
We show that an $mtimes n$ random matrix $S$ with $mgeq (1+theta)d$ is an oblivious subspace embedding with $epsilon = O_theta(1)$.
We use this to construct the first oblivious subspace embedding with $O(d)$ embedding dimension that can be applied faster than current matrix multiplication time.
arXiv Detail & Related papers (2023-11-17T18:01:58Z) - A spectral least-squares-type method for heavy-tailed corrupted
regression with unknown covariance \& heterogeneous noise [2.019622939313173]
We revisit heavy-tailed corrupted least-squares linear regression assuming to have a corrupted $n$-sized label-feature sample of at most $epsilon n$ arbitrary outliers.
We propose a near-optimal computationally tractable estimator, based on the power method, assuming no knowledge on $(Sigma,Xi) nor the operator norm of $Xi$.
arXiv Detail & Related papers (2022-09-06T23:37:31Z) - Low-Rank Approximation with $1/\epsilon^{1/3}$ Matrix-Vector Products [58.05771390012827]
We study iterative methods based on Krylov subspaces for low-rank approximation under any Schatten-$p$ norm.
Our main result is an algorithm that uses only $tildeO(k/sqrtepsilon)$ matrix-vector products.
arXiv Detail & Related papers (2022-02-10T16:10:41Z) - Tree density estimation [12.831051269764115]
density estimation for a random vector $boldsymbol X$ in $mathbb Rd$ with probability density $f(boldsymbol x)$.
For Lipschitz continuous $f$ with bounded support, $mathbb E int |f_n(boldsymbol x)-fT*(boldsymbol x)|dboldsymbol x=0$ a.s.
For Lipschitz continuous $f$ with bounded support, $mathbb E int |f_n(boldsymbol x)-f
arXiv Detail & Related papers (2021-11-23T16:05:59Z) - Active Sampling for Linear Regression Beyond the $\ell_2$ Norm [70.49273459706546]
We study active sampling algorithms for linear regression, which aim to query only a small number of entries of a target vector.
We show that this dependence on $d$ is optimal, up to logarithmic factors.
We also provide the first total sensitivity upper bound $O(dmax1,p/2log2 n)$ for loss functions with at most degree $p$ growth.
arXiv Detail & Related papers (2021-11-09T00:20:01Z) - Self-training Converts Weak Learners to Strong Learners in Mixture
Models [86.7137362125503]
We show that a pseudolabeler $boldsymbolbeta_mathrmpl$ can achieve classification error at most $C_mathrmerr$.
We additionally show that by running gradient descent on the logistic loss one can obtain a pseudolabeler $boldsymbolbeta_mathrmpl$ with classification error $C_mathrmerr$ using only $O(d)$ labeled examples.
arXiv Detail & Related papers (2021-06-25T17:59:16Z) - Learning a Latent Simplex in Input-Sparsity Time [58.30321592603066]
We consider the problem of learning a latent $k$-vertex simplex $KsubsetmathbbRdtimes n$, given access to $AinmathbbRdtimes n$.
We show that the dependence on $k$ in the running time is unnecessary given a natural assumption about the mass of the top $k$ singular values of $A$.
arXiv Detail & Related papers (2021-05-17T16:40:48Z) - Near-Optimal Learning of Tree-Structured Distributions by Chow-Liu [14.298220510927695]
We provide finite sample guarantees for the classical ChowLiu algorithm (IEEE Trans.Inform.Theory, 1968)
We show that for a specific tree $T$, with $widetildeO (|Sigma|2nvarepsilon-1)$ samples from a distribution $P$ over $Sigman$, one can efficiently learn the closest KL divergence.
arXiv Detail & Related papers (2020-11-09T02:08:56Z) - On Efficient Low Distortion Ultrametric Embedding [18.227854382422112]
A widely-used method to preserve the underlying hierarchical structure of the data is to find an embedding of the data into a tree or an ultrametric.
In this paper, we provide a new algorithm which takes as input a set of points isometric in $mathbbRd2 (for universal constant $rho>1$) to output an ultrametric $Delta.
We show that the output of our algorithm is comparable to the output of the linkage algorithms while achieving a much faster running time.
arXiv Detail & Related papers (2020-08-15T11:06:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.