One-Nearest-Neighbor Search is All You Need for Minimax Optimal
Regression and Classification
- URL: http://arxiv.org/abs/2202.02464v1
- Date: Sat, 5 Feb 2022 01:59:09 GMT
- Title: One-Nearest-Neighbor Search is All You Need for Minimax Optimal
Regression and Classification
- Authors: J. Jon Ryu and Young-Han Kim
- Abstract summary: This paper shows that the distributed algorithm with $k=1$ over a sufficiently large number of groups attains a minimax optimal error rate up to a multiplicative logarithmic factor under some regularity conditions.
In the analysis, alternative rules with a refined aggregation method are proposed and shown to attain exact minimax optimal rates.
- Score: 22.98602552223454
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, Qiao, Duan, and Cheng~(2019) proposed a distributed
nearest-neighbor classification method, in which a massive dataset is split
into smaller groups, each processed with a $k$-nearest-neighbor classifier, and
the final class label is predicted by a majority vote among these groupwise
class labels. This paper shows that the distributed algorithm with $k=1$ over a
sufficiently large number of groups attains a minimax optimal error rate up to
a multiplicative logarithmic factor under some regularity conditions, for both
regression and classification problems. Roughly speaking, distributed
1-nearest-neighbor rules with $M$ groups has a performance comparable to
standard $\Theta(M)$-nearest-neighbor rules. In the analysis, alternative rules
with a refined aggregation method are proposed and shown to attain exact
minimax optimal rates.
Related papers
- Fair Submodular Cover [18.37610521373708]
We present the study of Fair Submodular Cover (FSC), where given a ground set $U$, a monotone submodular function $f:2UtomathbbR_ge 0$, a threshold $tau$.
We first introduce discrete algorithms for FSC that achieve a bicriteria approximation ratio of $(frac1epsilon, 1-O(epsilon))$.
We then present a continuous algorithm that achieves a $(frac1epsilon, 1-O(epsilon))$-
arXiv Detail & Related papers (2024-07-05T18:37:09Z) - Diversity-aware clustering: Computational Complexity and Approximation
Algorithms [19.67390261007849]
We study diversity-aware clustering problems where the data points are associated with multiple attributes resulting in intersecting groups.
A clustering solution need to ensure that a minimum number of cluster centers are chosen from each group.
We present parameterized approximation algorithms with approximation ratios $1+ frac2e$, $1+frac8e$ and $3$.
arXiv Detail & Related papers (2024-01-10T19:01:05Z) - Clustering Mixtures of Bounded Covariance Distributions Under Optimal
Separation [44.25945344950543]
We study the clustering problem for mixtures of bounded covariance distributions.
We give the first poly-time algorithm for this clustering task.
Our algorithm is robust to $Omega(alpha)$-fraction of adversarial outliers.
arXiv Detail & Related papers (2023-12-19T01:01:53Z) - Are Easy Data Easy (for K-Means) [0.0]
This paper investigates the capability of correctly recovering well-separated clusters by various brands of the $k$-means algorithm.
A new algorithm is proposed that is a variation of $k$-means++ via repeated subsampling when choosing a seed.
arXiv Detail & Related papers (2023-08-02T09:40:19Z) - Differentially-Private Hierarchical Clustering with Provable
Approximation Guarantees [79.59010418610625]
We study differentially private approximation algorithms for hierarchical clustering.
We show strong lower bounds for the problem: that any $epsilon$-DP algorithm must exhibit $O(|V|2/ epsilon)$-additive error for an input dataset.
We propose a private $1+o(1)$ approximation algorithm which also recovers the blocks exactly.
arXiv Detail & Related papers (2023-01-31T19:14:30Z) - Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model.
The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z) - Under-bagging Nearest Neighbors for Imbalanced Classification [63.026765294759876]
We propose an ensemble learning algorithm called textitunder-bagging $k$-NN (textitunder-bagging $k$-NN) for imbalanced classification problems.
arXiv Detail & Related papers (2021-09-01T14:10:38Z) - Clustering Mixture Models in Almost-Linear Time via List-Decodable Mean
Estimation [58.24280149662003]
We study the problem of list-decodable mean estimation, where an adversary can corrupt a majority of the dataset.
We develop new algorithms for list-decodable mean estimation, achieving nearly-optimal statistical guarantees.
arXiv Detail & Related papers (2021-06-16T03:34:14Z) - Computationally efficient sparse clustering [67.95910835079825]
We provide a finite sample analysis of a new clustering algorithm based on PCA.
We show that it achieves the minimax optimal misclustering rate in the regime $|theta infty$.
arXiv Detail & Related papers (2020-05-21T17:51:30Z) - Structures of Spurious Local Minima in $k$-means [20.155509538529568]
We investigate the structures of spurious local solutions to the $k$-means problem.
We prove that this is essentially the only type of spurious local minima under a separation condition.
arXiv Detail & Related papers (2020-02-16T22:15:03Z) - Coresets for the Nearest-Neighbor Rule [78.15296214629433]
Nearest-neighbor condensation deals with finding a subset $R subseteq P$.
This paper introduces the concept of coresets for nearest-neighbor classification.
We propose quadratic-time approximation algorithms with provable upper-bounds on the size of their selected subsets.
arXiv Detail & Related papers (2020-02-16T19:00:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.