Related papers: One-Nearest-Neighbor Search is All You Need for Minimax Optimal Regression and Classification

One-Nearest-Neighbor Search is All You Need for Minimax Optimal Regression and Classification

URL: http://arxiv.org/abs/2202.02464v1
Date: Sat, 5 Feb 2022 01:59:09 GMT
Title: One-Nearest-Neighbor Search is All You Need for Minimax Optimal Regression and Classification
Authors: J. Jon Ryu and Young-Han Kim
Abstract summary: This paper shows that the distributed algorithm with $k=1$ over a sufficiently large number of groups attains a minimax optimal error rate up to a multiplicative logarithmic factor under some regularity conditions. In the analysis, alternative rules with a refined aggregation method are proposed and shown to attain exact minimax optimal rates.
Score: 22.98602552223454
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, Qiao, Duan, and Cheng~(2019) proposed a distributed nearest-neighbor classification method, in which a massive dataset is split into smaller groups, each processed with a $k$-nearest-neighbor classifier, and the final class label is predicted by a majority vote among these groupwise class labels. This paper shows that the distributed algorithm with $k=1$ over a sufficiently large number of groups attains a minimax optimal error rate up to a multiplicative logarithmic factor under some regularity conditions, for both regression and classification problems. Roughly speaking, distributed 1-nearest-neighbor rules with $M$ groups has a performance comparable to standard $\Theta(M)$-nearest-neighbor rules. In the analysis, alternative rules with a refined aggregation method are proposed and shown to attain exact minimax optimal rates.

Related papers

Fair Submodular Cover [18.37610521373708]
We present the study of Fair Submodular Cover (FSC), where given a ground set $U$, a monotone submodular function $f:2UtomathbbR_ge 0$, a threshold $tau$. We first introduce discrete algorithms for FSC that achieve a bicriteria approximation ratio of $(frac1epsilon, 1-O(epsilon))$. We then present a continuous algorithm that achieves a $(frac1epsilon, 1-O(epsilon))$-
arXiv Detail & Related papers (2024-07-05T18:37:09Z)
Diversity-aware clustering: Computational Complexity and Approximation Algorithms [19.67390261007849]
We study diversity-aware clustering problems where the data points are associated with multiple attributes resulting in intersecting groups. A clustering solution need to ensure that a minimum number of cluster centers are chosen from each group. We present parameterized approximation algorithms with approximation ratios $1+ frac2e$, $1+frac8e$ and $3$.
arXiv Detail & Related papers (2024-01-10T19:01:05Z)
Clustering Mixtures of Bounded Covariance Distributions Under Optimal Separation [44.25945344950543]
We study the clustering problem for mixtures of bounded covariance distributions. We give the first poly-time algorithm for this clustering task. Our algorithm is robust to $Omega(alpha)$-fraction of adversarial outliers.
arXiv Detail & Related papers (2023-12-19T01:01:53Z)
Are Easy Data Easy (for K-Means) [0.0]
This paper investigates the capability of correctly recovering well-separated clusters by various brands of the $k$-means algorithm. A new algorithm is proposed that is a variation of $k$-means++ via repeated subsampling when choosing a seed.
arXiv Detail & Related papers (2023-08-02T09:40:19Z)
Differentially-Private Hierarchical Clustering with Provable Approximation Guarantees [79.59010418610625]
We study differentially private approximation algorithms for hierarchical clustering. We show strong lower bounds for the problem: that any $epsilon$-DP algorithm must exhibit $O(|V|2/ epsilon)$-additive error for an input dataset. We propose a private $1+o(1)$ approximation algorithm which also recovers the blocks exactly.
arXiv Detail & Related papers (2023-01-31T19:14:30Z)
Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model. The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z)
Under-bagging Nearest Neighbors for Imbalanced Classification [63.026765294759876]
We propose an ensemble learning algorithm called textitunder-bagging $k$-NN (textitunder-bagging $k$-NN) for imbalanced classification problems.
arXiv Detail & Related papers (2021-09-01T14:10:38Z)
Clustering Mixture Models in Almost-Linear Time via List-Decodable Mean Estimation [58.24280149662003]
We study the problem of list-decodable mean estimation, where an adversary can corrupt a majority of the dataset. We develop new algorithms for list-decodable mean estimation, achieving nearly-optimal statistical guarantees.
arXiv Detail & Related papers (2021-06-16T03:34:14Z)
Computationally efficient sparse clustering [67.95910835079825]
We provide a finite sample analysis of a new clustering algorithm based on PCA. We show that it achieves the minimax optimal misclustering rate in the regime $|theta infty$.
arXiv Detail & Related papers (2020-05-21T17:51:30Z)
Structures of Spurious Local Minima in $k$-means [20.155509538529568]
We investigate the structures of spurious local solutions to the $k$-means problem. We prove that this is essentially the only type of spurious local minima under a separation condition.
arXiv Detail & Related papers (2020-02-16T22:15:03Z)
Coresets for the Nearest-Neighbor Rule [78.15296214629433]
Nearest-neighbor condensation deals with finding a subset $R subseteq P$. This paper introduces the concept of coresets for nearest-neighbor classification. We propose quadratic-time approximation algorithms with provable upper-bounds on the size of their selected subsets.
arXiv Detail & Related papers (2020-02-16T19:00:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.