Coresets for the Nearest-Neighbor Rule
- URL: http://arxiv.org/abs/2002.06650v3
- Date: Wed, 22 Jul 2020 19:14:41 GMT
- Title: Coresets for the Nearest-Neighbor Rule
- Authors: Alejandro Flores-Velazco, David M. Mount
- Abstract summary: Nearest-neighbor condensation deals with finding a subset $R subseteq P$.
This paper introduces the concept of coresets for nearest-neighbor classification.
We propose quadratic-time approximation algorithms with provable upper-bounds on the size of their selected subsets.
- Score: 78.15296214629433
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Given a training set $P$ of labeled points, the nearest-neighbor rule
predicts the class of an unlabeled query point as the label of its closest
point in the set. To improve the time and space complexity of classification, a
natural question is how to reduce the training set without significantly
affecting the accuracy of the nearest-neighbor rule. Nearest-neighbor
condensation deals with finding a subset $R \subseteq P$ such that for every
point $p \in P$, $p$'s nearest-neighbor in $R$ has the same label as $p$. This
relates to the concept of coresets, which can be broadly defined as subsets of
the set, such that an exact result on the coreset corresponds to an approximate
result on the original set. However, the guarantees of a coreset hold for any
query point, and not only for the points of the training set.
This paper introduces the concept of coresets for nearest-neighbor
classification. We extend existing criteria used for condensation, and prove
sufficient conditions to correctly classify any query point when using these
subsets. Additionally, we prove that finding such subsets of minimum
cardinality is NP-hard, and propose quadratic-time approximation algorithms
with provable upper-bounds on the size of their selected subsets. Moreover, we
show how to improve one of these algorithms to have subquadratic runtime, being
the first of this kind for condensation.
Related papers
- Fair Submodular Cover [18.37610521373708]
We present the study of Fair Submodular Cover (FSC), where given a ground set $U$, a monotone submodular function $f:2UtomathbbR_ge 0$, a threshold $tau$.
We first introduce discrete algorithms for FSC that achieve a bicriteria approximation ratio of $(frac1epsilon, 1-O(epsilon))$.
We then present a continuous algorithm that achieves a $(frac1epsilon, 1-O(epsilon))$-
arXiv Detail & Related papers (2024-07-05T18:37:09Z) - Sketching Algorithms for Sparse Dictionary Learning: PTAS and Turnstile
Streaming [48.18845814885398]
We develop new techniques to extend the applicability of sketching-based approaches to sparse dictionary learning and the Euclidean $k$-means clustering problems.
On the fast algorithms front, we obtain a new approach for designing PTAS's for the $k$-means clustering problem.
On the streaming algorithms front, we obtain new upper bounds and lower bounds for dictionary learning and $k$-means clustering.
arXiv Detail & Related papers (2023-10-29T16:46:26Z) - Reducing Nearest Neighbor Training Sets Optimally and Exactly [0.0]
In nearest-neighbor classification, a training set $P$ of points in $mathbbRd$ with given classification is used to classify every point in $mathbbRd$
We show that the set of relevant points is such a minimum cardinality reduced training set if $P$ is in general position.
arXiv Detail & Related papers (2023-02-04T08:48:59Z) - Improved Search of Relevant Points for Nearest-Neighbor Classification [91.3755431537592]
We say a training point is relevant if its omission from the training set could induce the misclassification of some query point in $mathbbRd$.
An output-sensitive algorithm was proposed to find the set of border points of $P$ in $O( n2 + nk2 )$ time, where $k$ is the size of such set.
In this paper, we improve this algorithm to have time complexity equal to $O( nk2 )$ by proving that the first steps of their algorithm, which require $O( n2
arXiv Detail & Related papers (2022-03-07T18:10:27Z) - FriendlyCore: Practical Differentially Private Aggregation [67.04951703461657]
We propose a simple and practical tool $mathsfFriendlyCore$ that takes a set of points $cal D$ from an unrestricted (pseudo) metric space as input.
When $cal D$ has effective diameter $r$, $mathsfFriendlyCore$ returns a "stable" subset $cal D_Gsubseteq cal D$ that includes all points.
$mathsfFriendlyCore$ can be used to preprocess the input before privately aggregating it, potentially simplifying the aggregation or boosting its accuracy
arXiv Detail & Related papers (2021-10-19T17:43:50Z) - Finding Relevant Points for Nearest-Neighbor Classification [0.456877715768796]
In nearest-neighbor classification problems, a set of $d$-dimensional training points are given, each with a known classification, and are used to infer unknown classifications of other points.
A training point is relevant if its omission from the training set would change the outcome of these inferences.
arXiv Detail & Related papers (2021-10-12T16:58:29Z) - Social Distancing is Good for Points too! [91.3755431537592]
We show that FCNN can behave poorly when points are too close to each other.
We then successfully modify the algorithm to avoid such cases.
This modification is sufficient to prove useful upper-bounds, along with approximation guarantees for the algorithm.
arXiv Detail & Related papers (2020-06-28T16:49:59Z) - Second-Order Information in Non-Convex Stochastic Optimization: Power
and Limitations [54.42518331209581]
We find an algorithm which finds.
epsilon$-approximate stationary point (with $|nabla F(x)|le epsilon$) using.
$(epsilon,gamma)$surimate random random points.
Our lower bounds here are novel even in the noiseless case.
arXiv Detail & Related papers (2020-06-24T04:41:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.