Related papers: Coresets for the Nearest-Neighbor Rule

Coresets for the Nearest-Neighbor Rule

URL: http://arxiv.org/abs/2002.06650v3
Date: Wed, 22 Jul 2020 19:14:41 GMT
Title: Coresets for the Nearest-Neighbor Rule
Authors: Alejandro Flores-Velazco, David M. Mount
Abstract summary: Nearest-neighbor condensation deals with finding a subset $R subseteq P$. This paper introduces the concept of coresets for nearest-neighbor classification. We propose quadratic-time approximation algorithms with provable upper-bounds on the size of their selected subsets.
Score: 78.15296214629433
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Given a training set $P$ of labeled points, the nearest-neighbor rule predicts the class of an unlabeled query point as the label of its closest point in the set. To improve the time and space complexity of classification, a natural question is how to reduce the training set without significantly affecting the accuracy of the nearest-neighbor rule. Nearest-neighbor condensation deals with finding a subset $R \subseteq P$ such that for every point $p \in P$, $p$'s nearest-neighbor in $R$ has the same label as $p$. This relates to the concept of coresets, which can be broadly defined as subsets of the set, such that an exact result on the coreset corresponds to an approximate result on the original set. However, the guarantees of a coreset hold for any query point, and not only for the points of the training set. This paper introduces the concept of coresets for nearest-neighbor classification. We extend existing criteria used for condensation, and prove sufficient conditions to correctly classify any query point when using these subsets. Additionally, we prove that finding such subsets of minimum cardinality is NP-hard, and propose quadratic-time approximation algorithms with provable upper-bounds on the size of their selected subsets. Moreover, we show how to improve one of these algorithms to have subquadratic runtime, being the first of this kind for condensation.

Related papers

Guessing Efficiently for Constrained Subspace Approximation [49.83981776254246]
We introduce a general framework for constrained subspace approximation. We show it provides new algorithms for partition-constrained subspace approximation with applications to $k$-means clustering, and projected non-negative matrix factorization.
arXiv Detail & Related papers (2025-04-29T15:56:48Z)
Fair Submodular Cover [18.37610521373708]
We present the study of Fair Submodular Cover (FSC), where given a ground set $U$, a monotone submodular function $f:2UtomathbbR_ge 0$, a threshold $tau$. We first introduce discrete algorithms for FSC that achieve a bicriteria approximation ratio of $(frac1epsilon, 1-O(epsilon))$. We then present a continuous algorithm that achieves a $(frac1epsilon, 1-O(epsilon))$-
arXiv Detail & Related papers (2024-07-05T18:37:09Z)
Ridge Leverage Score Sampling for $\ell_p$ Subspace Approximation [47.790126028106734]
A popular approach to cope with the NP-hardness is to compute a strong coreset. We obtain an algorithm for constructing a strong coreset for $ell_p$ subspace approximation of size $tilde O(kepsilon-4/p)$ for $p2$ and $tilde O(kp/2epsilon-p)$ for $p>2$.
arXiv Detail & Related papers (2024-07-03T16:49:28Z)
Sketching Algorithms for Sparse Dictionary Learning: PTAS and Turnstile Streaming [48.18845814885398]
We develop new techniques to extend the applicability of sketching-based approaches to sparse dictionary learning and the Euclidean $k$-means clustering problems. On the fast algorithms front, we obtain a new approach for designing PTAS's for the $k$-means clustering problem. On the streaming algorithms front, we obtain new upper bounds and lower bounds for dictionary learning and $k$-means clustering.
arXiv Detail & Related papers (2023-10-29T16:46:26Z)
Reducing Nearest Neighbor Training Sets Optimally and Exactly [0.0]
In nearest-neighbor classification, a training set $P$ of points in $mathbbRd$ with given classification is used to classify every point in $mathbbRd$ We show that the set of relevant points is such a minimum cardinality reduced training set if $P$ is in general position.
arXiv Detail & Related papers (2023-02-04T08:48:59Z)
Improved Search of Relevant Points for Nearest-Neighbor Classification [91.3755431537592]
We say a training point is relevant if its omission from the training set could induce the misclassification of some query point in $mathbbRd$. An output-sensitive algorithm was proposed to find the set of border points of $P$ in $O( n2 + nk2 )$ time, where $k$ is the size of such set. In this paper, we improve this algorithm to have time complexity equal to $O( nk2 )$ by proving that the first steps of their algorithm, which require $O( n2
arXiv Detail & Related papers (2022-03-07T18:10:27Z)
FriendlyCore: Practical Differentially Private Aggregation [67.04951703461657]
We propose a simple and practical tool $mathsfFriendlyCore$ that takes a set of points $cal D$ from an unrestricted (pseudo) metric space as input. When $cal D$ has effective diameter $r$, $mathsfFriendlyCore$ returns a "stable" subset $cal D_Gsubseteq cal D$ that includes all points. $mathsfFriendlyCore$ can be used to preprocess the input before privately aggregating it, potentially simplifying the aggregation or boosting its accuracy
arXiv Detail & Related papers (2021-10-19T17:43:50Z)
Finding Relevant Points for Nearest-Neighbor Classification [0.456877715768796]
In nearest-neighbor classification problems, a set of $d$-dimensional training points are given, each with a known classification, and are used to infer unknown classifications of other points. A training point is relevant if its omission from the training set would change the outcome of these inferences.
arXiv Detail & Related papers (2021-10-12T16:58:29Z)
Social Distancing is Good for Points too! [91.3755431537592]
We show that FCNN can behave poorly when points are too close to each other. We then successfully modify the algorithm to avoid such cases. This modification is sufficient to prove useful upper-bounds, along with approximation guarantees for the algorithm.
arXiv Detail & Related papers (2020-06-28T16:49:59Z)
Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations [54.42518331209581]
We find an algorithm which finds. epsilon$-approximate stationary point (with $|nabla F(x)|le epsilon$) using. $(epsilon,gamma)$surimate random random points. Our lower bounds here are novel even in the noiseless case.
arXiv Detail & Related papers (2020-06-24T04:41:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.