Reducing Nearest Neighbor Training Sets Optimally and Exactly
- URL: http://arxiv.org/abs/2302.02132v1
- Date: Sat, 4 Feb 2023 08:48:59 GMT
- Title: Reducing Nearest Neighbor Training Sets Optimally and Exactly
- Authors: Josiah Rohrer and Simon Weber
- Abstract summary: In nearest-neighbor classification, a training set $P$ of points in $mathbbRd$ with given classification is used to classify every point in $mathbbRd$
We show that the set of relevant points is such a minimum cardinality reduced training set if $P$ is in general position.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In nearest-neighbor classification, a training set $P$ of points in
$\mathbb{R}^d$ with given classification is used to classify every point in
$\mathbb{R}^d$: Every point gets the same classification as its nearest
neighbor in $P$. Recently, Eppstein [SOSA'22] developed an algorithm to detect
the relevant training points, those points $p\in P$, such that $P$ and
$P\setminus\{p\}$ induce different classifications. We investigate the problem
of finding the minimum cardinality reduced training set $P'\subseteq P$ such
that $P$ and $P'$ induce the same classification. We show that the set of
relevant points is such a minimum cardinality reduced training set if $P$ is in
general position. Furthermore, we show that finding a minimum cardinality
reduced training set for possibly degenerate $P$ is in P for $d=1$, and
NP-complete for $d\geq 2$.
Related papers
- A Scalable Algorithm for Individually Fair K-means Clustering [77.93955971520549]
We present a scalable algorithm for the individually fair ($p$, $k$)-clustering problem introduced by Jung et al. and Mahabadi et al.
A clustering is then called individually fair if it has centers within distance $delta(x)$ of $x$ for each $xin P$.
We show empirically that not only is our algorithm much faster than prior work, but it also produces lower-cost solutions.
arXiv Detail & Related papers (2024-02-09T19:01:48Z) - Sketching Algorithms for Sparse Dictionary Learning: PTAS and Turnstile
Streaming [48.18845814885398]
We develop new techniques to extend the applicability of sketching-based approaches to sparse dictionary learning and the Euclidean $k$-means clustering problems.
On the fast algorithms front, we obtain a new approach for designing PTAS's for the $k$-means clustering problem.
On the streaming algorithms front, we obtain new upper bounds and lower bounds for dictionary learning and $k$-means clustering.
arXiv Detail & Related papers (2023-10-29T16:46:26Z) - Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - Improved Search of Relevant Points for Nearest-Neighbor Classification [91.3755431537592]
We say a training point is relevant if its omission from the training set could induce the misclassification of some query point in $mathbbRd$.
An output-sensitive algorithm was proposed to find the set of border points of $P$ in $O( n2 + nk2 )$ time, where $k$ is the size of such set.
In this paper, we improve this algorithm to have time complexity equal to $O( nk2 )$ by proving that the first steps of their algorithm, which require $O( n2
arXiv Detail & Related papers (2022-03-07T18:10:27Z) - FriendlyCore: Practical Differentially Private Aggregation [67.04951703461657]
We propose a simple and practical tool $mathsfFriendlyCore$ that takes a set of points $cal D$ from an unrestricted (pseudo) metric space as input.
When $cal D$ has effective diameter $r$, $mathsfFriendlyCore$ returns a "stable" subset $cal D_Gsubseteq cal D$ that includes all points.
$mathsfFriendlyCore$ can be used to preprocess the input before privately aggregating it, potentially simplifying the aggregation or boosting its accuracy
arXiv Detail & Related papers (2021-10-19T17:43:50Z) - Finding Relevant Points for Nearest-Neighbor Classification [0.456877715768796]
In nearest-neighbor classification problems, a set of $d$-dimensional training points are given, each with a known classification, and are used to infer unknown classifications of other points.
A training point is relevant if its omission from the training set would change the outcome of these inferences.
arXiv Detail & Related papers (2021-10-12T16:58:29Z) - Sets Clustering [25.358415142404752]
We prove that a core-set of $O(logn)$ sets always exists, and can be computed in $O(nlogn)$ time.
Applying an inefficient but optimal algorithm on this coreset allows us to obtain the first PTAS ($1+varepsilon$ approximation) for the sets-$k$-means problem.
Open source code and experimental results for document classification and facility locations are also provided.
arXiv Detail & Related papers (2020-03-09T13:30:30Z) - Coresets for the Nearest-Neighbor Rule [78.15296214629433]
Nearest-neighbor condensation deals with finding a subset $R subseteq P$.
This paper introduces the concept of coresets for nearest-neighbor classification.
We propose quadratic-time approximation algorithms with provable upper-bounds on the size of their selected subsets.
arXiv Detail & Related papers (2020-02-16T19:00:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.