Improved Search of Relevant Points for Nearest-Neighbor Classification
- URL: http://arxiv.org/abs/2203.03567v1
- Date: Mon, 7 Mar 2022 18:10:27 GMT
- Title: Improved Search of Relevant Points for Nearest-Neighbor Classification
- Authors: Alejandro Flores-Velazco
- Abstract summary: We say a training point is relevant if its omission from the training set could induce the misclassification of some query point in $mathbbRd$.
An output-sensitive algorithm was proposed to find the set of border points of $P$ in $O( n2 + nk2 )$ time, where $k$ is the size of such set.
In this paper, we improve this algorithm to have time complexity equal to $O( nk2 )$ by proving that the first steps of their algorithm, which require $O( n2
- Score: 91.3755431537592
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Given a training set $P \subset \mathbb{R}^d$, the nearest-neighbor
classifier assigns any query point $q \in \mathbb{R}^d$ to the class of its
closest point in $P$. To answer these classification queries, some training
points are more relevant than others. We say a training point is relevant if
its omission from the training set could induce the misclassification of some
query point in $\mathbb{R}^d$. These relevant points are commonly known as
border points, as they define the boundaries of the Voronoi diagram of $P$ that
separate points of different classes. Being able to compute this set of points
efficiently is crucial to reduce the size of the training set without affecting
the accuracy of the nearest-neighbor classifier.
Improving over a decades-long result by Clarkson, in a recent paper by
Eppstein an output-sensitive algorithm was proposed to find the set of border
points of $P$ in $O( n^2 + nk^2 )$ time, where $k$ is the size of such set. In
this paper, we improve this algorithm to have time complexity equal to $O( nk^2
)$ by proving that the first steps of their algorithm, which require $O( n^2 )$
time, are unnecessary.
Related papers
- Almost-linear Time Approximation Algorithm to Euclidean $k$-median and $k$-means [4.271492285528115]
We focus on the Euclidean $k$-median and $k$-means problems, two of the standard ways to model the task of clustering.
In this paper, we almost answer this question by presenting an almost linear-time algorithm to compute a constant-factor approximation.
arXiv Detail & Related papers (2024-07-15T20:04:06Z) - A Scalable Algorithm for Individually Fair K-means Clustering [77.93955971520549]
We present a scalable algorithm for the individually fair ($p$, $k$)-clustering problem introduced by Jung et al. and Mahabadi et al.
A clustering is then called individually fair if it has centers within distance $delta(x)$ of $x$ for each $xin P$.
We show empirically that not only is our algorithm much faster than prior work, but it also produces lower-cost solutions.
arXiv Detail & Related papers (2024-02-09T19:01:48Z) - Sketching Algorithms for Sparse Dictionary Learning: PTAS and Turnstile
Streaming [48.18845814885398]
We develop new techniques to extend the applicability of sketching-based approaches to sparse dictionary learning and the Euclidean $k$-means clustering problems.
On the fast algorithms front, we obtain a new approach for designing PTAS's for the $k$-means clustering problem.
On the streaming algorithms front, we obtain new upper bounds and lower bounds for dictionary learning and $k$-means clustering.
arXiv Detail & Related papers (2023-10-29T16:46:26Z) - Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - Reducing Nearest Neighbor Training Sets Optimally and Exactly [0.0]
In nearest-neighbor classification, a training set $P$ of points in $mathbbRd$ with given classification is used to classify every point in $mathbbRd$
We show that the set of relevant points is such a minimum cardinality reduced training set if $P$ is in general position.
arXiv Detail & Related papers (2023-02-04T08:48:59Z) - Finding Relevant Points for Nearest-Neighbor Classification [0.456877715768796]
In nearest-neighbor classification problems, a set of $d$-dimensional training points are given, each with a known classification, and are used to infer unknown classifications of other points.
A training point is relevant if its omission from the training set would change the outcome of these inferences.
arXiv Detail & Related papers (2021-10-12T16:58:29Z) - Clustering Mixture Models in Almost-Linear Time via List-Decodable Mean
Estimation [58.24280149662003]
We study the problem of list-decodable mean estimation, where an adversary can corrupt a majority of the dataset.
We develop new algorithms for list-decodable mean estimation, achieving nearly-optimal statistical guarantees.
arXiv Detail & Related papers (2021-06-16T03:34:14Z) - Social Distancing is Good for Points too! [91.3755431537592]
We show that FCNN can behave poorly when points are too close to each other.
We then successfully modify the algorithm to avoid such cases.
This modification is sufficient to prove useful upper-bounds, along with approximation guarantees for the algorithm.
arXiv Detail & Related papers (2020-06-28T16:49:59Z) - Coresets for the Nearest-Neighbor Rule [78.15296214629433]
Nearest-neighbor condensation deals with finding a subset $R subseteq P$.
This paper introduces the concept of coresets for nearest-neighbor classification.
We propose quadratic-time approximation algorithms with provable upper-bounds on the size of their selected subsets.
arXiv Detail & Related papers (2020-02-16T19:00:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.