Bagged $k$-Distance for Mode-Based Clustering Using the Probability of
Localized Level Sets
- URL: http://arxiv.org/abs/2210.09786v1
- Date: Tue, 18 Oct 2022 11:58:35 GMT
- Title: Bagged $k$-Distance for Mode-Based Clustering Using the Probability of
Localized Level Sets
- Authors: Hanyuan Hang
- Abstract summary: We propose an ensemble learning algorithm named textitbagged $k$-distance for mode-based clustering (textitBDMBC)
On the theoretical side, we show that with a properly chosen number of nearest neighbors $k_D$ in the bagged $k$-distance, the sub-sample size $s$, the bagging rounds $B$, and the number of nearest neighbors $k_L$ for the localized level sets, BDMBC can achieve optimal convergence rates for mode estimation.
- Score: 7.208515071018781
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose an ensemble learning algorithm named \textit{bagged
$k$-distance for mode-based clustering} (\textit{BDMBC}) by putting forward a
new measurement called the \textit{probability of localized level sets}
(\textit{PLLS}), which enables us to find all clusters for varying densities
with a global threshold. On the theoretical side, we show that with a properly
chosen number of nearest neighbors $k_D$ in the bagged $k$-distance, the
sub-sample size $s$, the bagging rounds $B$, and the number of nearest
neighbors $k_L$ for the localized level sets, BDMBC can achieve optimal
convergence rates for mode estimation. It turns out that with a relatively
small $B$, the sub-sample size $s$ can be much smaller than the number of
training data $n$ at each bagging round, and the number of nearest neighbors
$k_D$ can be reduced simultaneously. Moreover, we establish optimal convergence
results for the level set estimation of the PLLS in terms of Hausdorff
distance, which reveals that BDMBC can find localized level sets for varying
densities and thus enjoys local adaptivity. On the practical side, we conduct
numerical experiments to empirically verify the effectiveness of BDMBC for mode
estimation and level set estimation, which demonstrates the promising accuracy
and efficiency of our proposed algorithm.
Related papers
- Adaptive $k$-nearest neighbor classifier based on the local estimation of the shape operator [49.87315310656657]
We introduce a new adaptive $k$-nearest neighbours ($kK$-NN) algorithm that explores the local curvature at a sample to adaptively defining the neighborhood size.
Results on many real-world datasets indicate that the new $kK$-NN algorithm yields superior balanced accuracy compared to the established $k$-NN method.
arXiv Detail & Related papers (2024-09-08T13:08:45Z) - Learning conditional distributions on continuous spaces [0.0]
We investigate sample-based learning of conditional distributions on multi-dimensional unit boxes.
We employ two distinct clustering schemes: one based on a fixed-radius ball and the other on nearest neighbors.
We propose to incorporate the nearest neighbors method into neural network training, as our empirical analysis indicates it has better performance in practice.
arXiv Detail & Related papers (2024-06-13T17:53:47Z) - A Unified Framework for Gradient-based Clustering of Distributed Data [51.904327888475606]
We develop a family of distributed clustering algorithms that work over networks of users.
DGC-$mathcalF_rho$ is specialized to popular clustering losses like $K$-means and Huber loss.
We show that consensus fixed points of DGC-$mathcalF_rho$ are equivalent to fixed points of gradient clustering over the full data.
arXiv Detail & Related papers (2024-02-02T10:44:42Z) - Stochastic Approximation Approaches to Group Distributionally Robust
Optimization [96.26317627118912]
Group distributionally robust optimization (GDRO)
Online learning techniques to reduce the number of samples required in each round from $m$ to $1$, keeping the same sample.
A novel formulation of weighted GDRO, which allows us to derive distribution-dependent convergence rates.
arXiv Detail & Related papers (2023-02-18T09:24:15Z) - Minimax Optimal Quantization of Linear Models: Information-Theoretic
Limits and Efficient Algorithms [59.724977092582535]
We consider the problem of quantizing a linear model learned from measurements.
We derive an information-theoretic lower bound for the minimax risk under this setting.
We show that our method and upper-bounds can be extended for two-layer ReLU neural networks.
arXiv Detail & Related papers (2022-02-23T02:39:04Z) - Minimax Optimal Algorithms with Fixed-$k$-Nearest Neighbors [13.231906521852718]
We consider a distributed learning scenario in which a massive dataset is split into smaller groups.
We propose emphoptimal rules to aggregate the fixed-$k$-NN information for classification, regression, and density estimation.
We show that the distributed algorithm with a fixed $k$ over a sufficiently large number of groups attains a minimax optimal error rate up to a multiplicative logarithmic factor.
arXiv Detail & Related papers (2022-02-05T01:59:09Z) - Under-bagging Nearest Neighbors for Imbalanced Classification [63.026765294759876]
We propose an ensemble learning algorithm called textitunder-bagging $k$-NN (textitunder-bagging $k$-NN) for imbalanced classification problems.
arXiv Detail & Related papers (2021-09-01T14:10:38Z) - Computationally efficient sparse clustering [67.95910835079825]
We provide a finite sample analysis of a new clustering algorithm based on PCA.
We show that it achieves the minimax optimal misclustering rate in the regime $|theta infty$.
arXiv Detail & Related papers (2020-05-21T17:51:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.