Learning Centre Partitions from Summaries
- URL: http://arxiv.org/abs/2509.16337v1
- Date: Fri, 19 Sep 2025 18:25:06 GMT
- Title: Learning Centre Partitions from Summaries
- Authors: Zinsou Max Debaly, Jean-Francois Ethier, Michael H. Neumann, Félix Camirand Lemyre,
- Abstract summary: Homogeneity of parameters across centres is often violated, motivating methods that both emphtest for equality and emphlearn centre groupings before estimation.<n>We develop a sequential, test-driven emphClusters-of-Centres (CoC) algorithm that merges centres (or blocks) only when equality is not rejected.
- Score: 0.21748200848556343
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-centre studies increasingly rely on distributed inference, where sites share only centre-level summaries. Homogeneity of parameters across centres is often violated, motivating methods that both \emph{test} for equality and \emph{learn} centre groupings before estimation. We develop multivariate Cochran-type tests that operate on summary statistics and embed them in a sequential, test-driven \emph{Clusters-of-Centres (CoC)} algorithm that merges centres (or blocks) only when equality is not rejected. We derive the asymptotic $\chi^2$-mixture distributions of the test statistics and provide plug-in estimators for implementation. To improve finite-sample integration, we introduce a multi-round bootstrap CoC that re-evaluates merges across independently resampled summary sets; under mild regularity and a separation condition, we prove a \emph{golden-partition recovery} result: as the number of rounds grows with $n$, the true partition is recovered with probability tending to one. We also give simple numerical guidelines, including a plateau-based stopping rule, to make the multi-round procedure reproducible. Simulations and a real-data analysis of U.S.\ airline on-time performance (2007) show accurate heterogeneity detection and partitions that change little with the choice of resampling scheme.
Related papers
- Almost Asymptotically Optimal Active Clustering Through Pairwise Observations [59.20614082241528]
We propose a new analysis framework for clustering $M$ items into an unknown number of $K$ distinct groups using noisy and actively collected responses.<n>We establish a fundamental lower bound on the expected number of queries needed to achieve a desired confidence in the accuracy of the clustering.<n>We develop a computationally feasible variant of the Generalized Likelihood Ratio statistic and show that its performance gap to the lower bound can be accurately empirically estimated.
arXiv Detail & Related papers (2026-02-05T14:16:47Z) - Post-Hoc Split-Point Self-Consistency Verification for Efficient, Unified Quantification of Aleatoric and Epistemic Uncertainty in Deep Learning [5.996056764788456]
Uncertainty quantification (UQ) is vital for trustworthy deep learning, yet existing methods are either computationally intensive or provide only partial, task-specific estimates.<n>We propose a post-hoc single-forward-pass framework that jointly captures aleatoric and epistemic uncertainty without modifying or retraining pretrained models.<n>Our method applies emphSplit-Point Analysis (SPA) to decompose predictive residuals into upper and lower subsets, computing emphMean Absolute Residuals (MARs) on each side.
arXiv Detail & Related papers (2025-09-16T17:16:01Z) - Assessing One-Dimensional Cluster Stability by Extreme-Point Trimming [0.0]
We develop a probabilistic method for assessing the tail behavior and geometric stability of one-dimensional i.i.d. samples.<n>We derive analytical expressions, including finite-sample corrections, for the expected shrinkage under both the uniform and Gaussian hypotheses.<n>We further integrate our criterion into a clustering pipeline (e.g. DBSCAN), demonstrating its ability to validate one-dimensional clusters without any density estimation or parameter tuning.
arXiv Detail & Related papers (2025-08-29T21:52:15Z) - One-shot Robust Federated Learning of Independent Component Analysis [16.462282750354408]
We propose a geometric median-based aggregation algorithm that leverages $k$-means clustering to resolve the permutation ambiguity in local client estimations.<n>Our method first performs k-means to partition client-provided estimators into clusters and then aggregates estimators within each cluster using the geometric median.
arXiv Detail & Related papers (2025-05-26T21:37:19Z) - Sublinear Algorithms for Wasserstein and Total Variation Distances: Applications to Fairness and Privacy Auditing [7.81603404636933]
We propose a generic framework to learn the probability and cumulative distribution functions (PDFs and CDFs) of a sub-Weibull.<n>We compute mergeable summaries of distributions from the stream of samples while requiring only sublinear space.<n>Our algorithms significantly improves on the existing methods for distance estimation incurring super-linear time and linear space complexities.
arXiv Detail & Related papers (2025-03-10T18:57:48Z) - Convergence of Score-Based Discrete Diffusion Models: A Discrete-Time Analysis [56.442307356162864]
We study the theoretical aspects of score-based discrete diffusion models under the Continuous Time Markov Chain (CTMC) framework.<n>We introduce a discrete-time sampling algorithm in the general state space $[S]d$ that utilizes score estimators at predefined time points.<n>Our convergence analysis employs a Girsanov-based method and establishes key properties of the discrete score function.
arXiv Detail & Related papers (2024-10-03T09:07:13Z) - Simultaneous Inference for Local Structural Parameters with Random Forests [19.014535120129338]
We construct simultaneous confidence intervals for solutions to conditional moment equations.
We obtain several new order-explicit results on the concentration and normal approximation of high-dimensional U.S.
As a by-product, we obtain several new order-explicit results on the concentration and normal approximation of high-dimensional U.S.
arXiv Detail & Related papers (2024-05-13T15:46:11Z) - Random Forest Weighted Local Fréchet Regression with Random Objects [18.128663071848923]
We propose a novel random forest weighted local Fr'echet regression paradigm.<n>Our first method uses these weights as the local average to solve the conditional Fr'echet mean.<n>Second method performs local linear Fr'echet regression, both significantly improving existing Fr'echet regression methods.
arXiv Detail & Related papers (2022-02-10T09:10:59Z) - Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and
Beyond [63.59034509960994]
We study shuffling-based variants: minibatch and local Random Reshuffling, which draw gradients without replacement.
For smooth functions satisfying the Polyak-Lojasiewicz condition, we obtain convergence bounds which show that these shuffling-based variants converge faster than their with-replacement counterparts.
We propose an algorithmic modification called synchronized shuffling that leads to convergence rates faster than our lower bounds in near-homogeneous settings.
arXiv Detail & Related papers (2021-10-20T02:25:25Z) - Kernel learning approaches for summarising and combining posterior
similarity matrices [68.8204255655161]
We build upon the notion of the posterior similarity matrix (PSM) in order to suggest new approaches for summarising the output of MCMC algorithms for Bayesian clustering models.
A key contribution of our work is the observation that PSMs are positive semi-definite, and hence can be used to define probabilistically-motivated kernel matrices.
arXiv Detail & Related papers (2020-09-27T14:16:14Z) - Computationally efficient sparse clustering [67.95910835079825]
We provide a finite sample analysis of a new clustering algorithm based on PCA.
We show that it achieves the minimax optimal misclustering rate in the regime $|theta infty$.
arXiv Detail & Related papers (2020-05-21T17:51:30Z) - Outlier-Robust Clustering of Non-Spherical Mixtures [5.863264019032882]
We give the first outlier-robust efficient algorithm for clustering a mixture of $k$ statistically separated d-dimensional Gaussians (k-GMMs)
Our results extend to clustering mixtures of arbitrary affine transforms of the uniform distribution on the $d$-dimensional unit sphere.
arXiv Detail & Related papers (2020-05-06T17:24:27Z) - Distributed, partially collapsed MCMC for Bayesian Nonparametrics [68.5279360794418]
We exploit the fact that completely random measures, which commonly used models like the Dirichlet process and the beta-Bernoulli process can be expressed as, are decomposable into independent sub-measures.
We use this decomposition to partition the latent measure into a finite measure containing only instantiated components, and an infinite measure containing all other components.
The resulting hybrid algorithm can be applied to allow scalable inference without sacrificing convergence guarantees.
arXiv Detail & Related papers (2020-01-15T23:10:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.