Related papers: Distributed Adaptive Nearest Neighbor Classifier: Algorithm and Theory

Distributed Adaptive Nearest Neighbor Classifier: Algorithm and Theory

URL: http://arxiv.org/abs/2105.09788v2
Date: Sat, 3 Jun 2023 16:18:32 GMT
Title: Distributed Adaptive Nearest Neighbor Classifier: Algorithm and Theory
Authors: Ruiqi Liu, Ganggang Xu, Zuofeng Shang
Abstract summary: We propose a novel distributed adaptive NN classifier for which the number of nearest neighbors is a tuning parameterally chosen by a data-driven criterion. An early stopping rule is proposed when searching for the optimal tuning parameter, which improves the finite sample performance. In particular, we show that when the sub-sample sizes are sufficiently large, the proposed classifier achieves the nearly optimal convergence rate.
Score: 6.696267547013535
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: When data is of an extraordinarily large size or physically stored in different locations, the distributed nearest neighbor (NN) classifier is an attractive tool for classification. We propose a novel distributed adaptive NN classifier for which the number of nearest neighbors is a tuning parameter stochastically chosen by a data-driven criterion. An early stopping rule is proposed when searching for the optimal tuning parameter, which not only speeds up the computation but also improves the finite sample performance of the proposed Algorithm. Convergence rate of excess risk of the distributed adaptive NN classifier is investigated under various sub-sample size compositions. In particular, we show that when the sub-sample sizes are sufficiently large, the proposed classifier achieves the nearly optimal convergence rate. Effectiveness of the proposed approach is demonstrated through simulation studies as well as an empirical application to a real-world dataset.

Related papers

Hybrid least squares for learning functions from highly noisy data [7.096701481970196]
We consider a least-squares function approximation problem with heavily polluted data.<n>Existing methods that are powerful in the small noise regime are suboptimal when large noise is present.<n>We show that the proposed algorithm enjoys appropriate optimality properties for both sample point generation and noise mollification.
arXiv Detail & Related papers (2025-07-03T00:31:29Z)
Clustering by Nonparametric Smoothing [6.635604919499181]
A novel formulation of the clustering problem is introduced in which the task is expressed as an estimation problem. The proposed approach bypasses any explicit modelling assumptions and exploits the flexible estimation potential of nonparametric smoothing. Experiments on a large collection of publicly available data sets are used to document the strong performance of the proposed approach.
arXiv Detail & Related papers (2025-03-12T07:44:11Z)
Adaptive $k$-nearest neighbor classifier based on the local estimation of the shape operator [49.87315310656657]
We introduce a new adaptive $k$-nearest neighbours ($kK$-NN) algorithm that explores the local curvature at a sample to adaptively defining the neighborhood size. Results on many real-world datasets indicate that the new $kK$-NN algorithm yields superior balanced accuracy compared to the established $k$-NN method.
arXiv Detail & Related papers (2024-09-08T13:08:45Z)
Adaptive Online Bayesian Estimation of Frequency Distributions with Local Differential Privacy [0.4604003661048266]
We propose a novel approach for the adaptive and online estimation of the frequency distribution of a finite number of categories under the local differential privacy (LDP) framework. The proposed algorithm performs Bayesian parameter estimation via posterior sampling and adapts the randomization mechanism for LDP based on the obtained posterior samples. We provide a theoretical analysis showing that (i) the posterior distribution targeted by the algorithm converges to the true parameter even for approximate posterior sampling, and (ii) the algorithm selects the optimal subset with high probability if posterior sampling is performed exactly.
arXiv Detail & Related papers (2024-05-11T13:59:52Z)
Stochastic optimization with arbitrary recurrent data sampling [2.1485350418225244]
Most commonly used data sampling algorithms are under mild assumptions. We show that for a particular class of recurrent optimization algorithms, we do not need any other property. We show that convergence can be accelerated by selecting sampling algorithms that cover the data set.
arXiv Detail & Related papers (2024-01-15T14:04:50Z)
Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions. We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z)
An Improved Greedy Algorithm for Subset Selection in Linear Estimation [5.994412766684842]
We consider a subset selection problem in a spatial field where we seek to find a set of k locations whose observations provide the best estimate of the field value at a finite set of prediction locations. One approach for observation selection is to perform a grid discretization of the space and obtain an approximate solution using the greedy algorithm. We propose a method to reduce the computational complexity by considering a search space consisting only of prediction locations and centroids of cliques formed by the prediction locations.
arXiv Detail & Related papers (2022-03-30T05:52:16Z)
Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment. Policy gradients for local search are often obtained from random perturbations. We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z)
Hyperdimensional Computing for Efficient Distributed Classification with Randomized Neural Networks [5.942847925681103]
We study distributed classification, which can be employed in situations were data cannot be stored at a central location nor shared. We propose a more efficient solution for distributed classification by making use of a lossy compression approach applied when sharing the local classifiers with other agents.
arXiv Detail & Related papers (2021-06-02T01:33:56Z)
Bandit Samplers for Training Graph Neural Networks [63.17765191700203]
Several sampling algorithms with variance reduction have been proposed for accelerating the training of Graph Convolution Networks (GCNs) These sampling algorithms are not applicable to more general graph neural networks (GNNs) where the message aggregator contains learned weights rather than fixed weights, such as Graph Attention Networks (GAT)
arXiv Detail & Related papers (2020-06-10T12:48:37Z)
Non-Adaptive Adaptive Sampling on Turnstile Streams [57.619901304728366]
We give the first relative-error algorithms for column subset selection, subspace approximation, projective clustering, and volume on turnstile streams that use space sublinear in $n$. Our adaptive sampling procedure has a number of applications to various data summarization problems that either improve state-of-the-art or have only been previously studied in the more relaxed row-arrival model.
arXiv Detail & Related papers (2020-04-23T05:00:21Z)
Stochastic batch size for adaptive regularization in deep network optimization [63.68104397173262]
We propose a first-order optimization algorithm incorporating adaptive regularization applicable to machine learning problems in deep learning framework. We empirically demonstrate the effectiveness of our algorithm using an image classification task based on conventional network models applied to commonly used benchmark datasets.
arXiv Detail & Related papers (2020-04-14T07:54:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.