Assessing One-Dimensional Cluster Stability by Extreme-Point Trimming
- URL: http://arxiv.org/abs/2509.00258v1
- Date: Fri, 29 Aug 2025 21:52:15 GMT
- Title: Assessing One-Dimensional Cluster Stability by Extreme-Point Trimming
- Authors: Erwan Dereure, Emmanuel Akame Mfoumou, David Holcman,
- Abstract summary: We develop a probabilistic method for assessing the tail behavior and geometric stability of one-dimensional i.i.d. samples.<n>We derive analytical expressions, including finite-sample corrections, for the expected shrinkage under both the uniform and Gaussian hypotheses.<n>We further integrate our criterion into a clustering pipeline (e.g. DBSCAN), demonstrating its ability to validate one-dimensional clusters without any density estimation or parameter tuning.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We develop a probabilistic method for assessing the tail behavior and geometric stability of one-dimensional n i.i.d. samples by tracking how their span contracts when the most extreme points are trimmed. Central to our approach is the diameter-shrinkage ratio, that quantifies the relative reduction in data range as extreme points are successively removed. We derive analytical expressions, including finite-sample corrections, for the expected shrinkage under both the uniform and Gaussian hypotheses, and establish that these curves remain distinct even for moderate number of removal. We construct an elementary decision rule that assigns a sample to whichever theoretical shrinkage profile it most closely follows. This test achieves higher classification accuracy than the classical likelihood-ratio test in small-sample or noisy regimes, while preserving asymptotic consistency for large n. We further integrate our criterion into a clustering pipeline (e.g. DBSCAN), demonstrating its ability to validate one-dimensional clusters without any density estimation or parameter tuning. This work thus provides both theoretical insight and practical tools for robust distributional inference and cluster stability analysis.
Related papers
- Sharp Convergence Rates for Masked Diffusion Models [53.117058231393834]
We develop a total-variation based analysis for the Euler method that overcomes limitations.<n>Our results relax assumptions on score estimation, improve parameter dependencies, and establish convergence guarantees.<n>Overall, our analysis introduces a direct TV-based error decomposition along the CTMC trajectory and a decoupling-based path-wise analysis for FHS.
arXiv Detail & Related papers (2026-02-26T00:47:51Z) - Universality of General Spiked Tensor Models [9.454986540713655]
We study the rank-one spiked tensor model in the high-dimensional regime.<n>We show that their high-dimensional spectral behavior and statistical limits are robust to non-Gaussian noise.
arXiv Detail & Related papers (2026-02-04T11:59:30Z) - Improved Sample Complexity for Full Coverage in Compact and Continuous Spaces [0.0]
We study uniform random sampling on the $d$-dimensional unit hypercube.<n>We derive a sample complexity bound with a logarithmic dependence on the failure probability.<n>Our findings offer a sharper theoretical tool for algorithms that rely on grid-based coverage guarantees.
arXiv Detail & Related papers (2025-11-21T21:06:14Z) - Revisiting Zeroth-Order Optimization: Minimum-Variance Two-Point Estimators and Directionally Aligned Perturbations [57.179679246370114]
We identify the distribution of random perturbations that minimizes the estimator's variance as the perturbation stepsize tends to zero.<n>Our findings reveal that such desired perturbations can align directionally with the true gradient, instead of maintaining a fixed length.
arXiv Detail & Related papers (2025-10-22T19:06:39Z) - Kernel-Based Nonparametric Tests For Shape Constraints [0.0]
We derive statistical properties of the sample estimator and provide rigorous theoretical guarantees.<n>We introduce a joint Wald-type statistic to test for shape constraints over finite grids.
arXiv Detail & Related papers (2025-10-19T08:07:33Z) - Unregularized limit of stochastic gradient method for Wasserstein distributionally robust optimization [8.784017987697688]
Distributionally robust optimization offers a compelling framework for model fitting in machine learning.<n>We investigate the regularized problem where entropic smoothing yields a sampling-based approximation of the original objective.
arXiv Detail & Related papers (2025-06-05T12:21:44Z) - A Unified Theory of Stochastic Proximal Point Methods without Smoothness [52.30944052987393]
Proximal point methods have attracted considerable interest owing to their numerical stability and robustness against imperfect tuning.
This paper presents a comprehensive analysis of a broad range of variations of the proximal point method (SPPM)
arXiv Detail & Related papers (2024-05-24T21:09:19Z) - Robust Stochastic Optimization via Gradient Quantile Clipping [6.2844649973308835]
We introduce a quant clipping strategy for Gradient Descent (SGD)
We use gradient new outliers as norm clipping chains.
We propose an implementation of the algorithm using Huberiles.
arXiv Detail & Related papers (2023-09-29T15:24:48Z) - Curvature-Independent Last-Iterate Convergence for Games on Riemannian
Manifolds [77.4346324549323]
We show that a step size agnostic to the curvature of the manifold achieves a curvature-independent and linear last-iterate convergence rate.
To the best of our knowledge, the possibility of curvature-independent rates and/or last-iterate convergence has not been considered before.
arXiv Detail & Related papers (2023-06-29T01:20:44Z) - Convergence of uncertainty estimates in Ensemble and Bayesian sparse
model discovery [4.446017969073817]
We show empirical success in terms of accuracy and robustness to noise with bootstrapping-based sequential thresholding least-squares estimator.
We show that this bootstrapping-based ensembling technique can perform a provably correct variable selection procedure with an exponential convergence rate of the error rate.
arXiv Detail & Related papers (2023-01-30T04:07:59Z) - Targeted Separation and Convergence with Kernel Discrepancies [61.973643031360254]
kernel-based discrepancy measures are required to (i) separate a target P from other probability measures or (ii) control weak convergence to P.<n>In this article we derive new sufficient and necessary conditions to ensure (i) and (ii)<n>For MMDs on separable metric spaces, we characterize those kernels that separate Bochner embeddable measures and introduce simple conditions for separating all measures with unbounded kernels.
arXiv Detail & Related papers (2022-09-26T16:41:16Z) - On the Convergence of Stochastic Extragradient for Bilinear Games with
Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence.
We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z) - $\gamma$-ABC: Outlier-Robust Approximate Bayesian Computation Based on a
Robust Divergence Estimator [95.71091446753414]
We propose to use a nearest-neighbor-based $gamma$-divergence estimator as a data discrepancy measure.
Our method achieves significantly higher robustness than existing discrepancy measures.
arXiv Detail & Related papers (2020-06-13T06:09:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.