Related papers: Identifying Heterogeneity in Distributed Learning

Identifying Heterogeneity in Distributed Learning

URL: http://arxiv.org/abs/2506.16394v3
Date: Tue, 24 Jun 2025 23:55:45 GMT
Title: Identifying Heterogeneity in Distributed Learning
Authors: Zelin Xiao, Jia Gu, Song Xi Chen,
Abstract summary: We study methods for identifying heterogeneous parameter components in distributed M-estimation with minimal data transmission.<n>One is based on a re-normalized Wald test, which is shown to be consistent as long as the number of distributed data blocks $K$ is of a smaller order of the minimum block sample size.<n>The second one is an extreme contrast test (ECT) based on the difference between the largest and smallest component-wise estimated parameters among data blocks.
Score: 1.7244120238071492
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We study methods for identifying heterogeneous parameter components in distributed M-estimation with minimal data transmission. One is based on a re-normalized Wald test, which is shown to be consistent as long as the number of distributed data blocks $K$ is of a smaller order of the minimum block sample size and the level of heterogeneity is dense. The second one is an extreme contrast test (ECT) based on the difference between the largest and smallest component-wise estimated parameters among data blocks. By introducing a sample splitting procedure, the ECT can avoid the bias accumulation arising from the M-estimation procedures, and exhibits consistency for $K$ being much larger than the sample size while the heterogeneity is sparse. The ECT procedure is easy to operate and communication-efficient. A combination of the Wald and the extreme contrast tests is formulated to attain more robust power under varying levels of sparsity of the heterogeneity. We also conduct intensive numerical experiments to compare the family-wise error rate (FWER) and the power of the proposed methods. Additionally, we conduct a case study to present the implementation and validity of the proposed methods.

Related papers

A Sample Efficient Conditional Independence Test in the Presence of Discretization [54.047334792855345]
Conditional Independence (CI) tests directly to discretized data can lead to incorrect conclusions.<n>Recent advancements have sought to infer the correct CI relationship between the latent variables through binarizing observed data.<n>Motivated by this, this paper introduces a sample-efficient CI test that does not rely on the binarization process.
arXiv Detail & Related papers (2025-06-10T12:41:26Z)
Multi-Metric Adaptive Experimental Design under Fixed Budget with Validation [10.5481503979787]
Standard A/B tests in online experiments face statistical power challenges when testing multiple candidates simultaneously.<n>This paper proposes a fixed-budget multi-metric AED framework with a two-phase structure: an adaptive exploration phase to identify the best treatment, and a validation phase to verify the treatment's quality and infer statistics.
arXiv Detail & Related papers (2025-06-03T16:41:11Z)
CKD: Contrastive Knowledge Distillation from A Sample-wise Perspective [48.99488315273868]
We propose a contrastive knowledge distillation framework that achieves sample-wise logit alignment while preserving semantic consistency.<n>Our approach transfers "dark knowledge" through teacher-student contrastive alignment at the sample level.<n>We conduct comprehensive experiments across three benchmark datasets, including the CIFAR-100, ImageNet-1K, and MS COCO datasets.
arXiv Detail & Related papers (2024-04-22T11:52:40Z)
Detecting Adversarial Data by Probing Multiple Perturbations Using Expected Perturbation Score [62.54911162109439]
Adversarial detection aims to determine whether a given sample is an adversarial one based on the discrepancy between natural and adversarial distributions. We propose a new statistic called expected perturbation score (EPS), which is essentially the expected score of a sample after various perturbations. We develop EPS-based maximum mean discrepancy (MMD) as a metric to measure the discrepancy between the test sample and natural samples.
arXiv Detail & Related papers (2023-05-25T13:14:58Z)
Determinantal point processes based on orthogonal polynomials for sampling minibatches in SGD [0.0]
gradient descent (SGD) is a cornerstone of machine learning. default minibatch construction involves uniformly sampling a subset of the desired size. We show how specific DPPs and a string of controlled approximations can lead to gradient estimators with a variance that decays faster with the batchsize than under uniform sampling.
arXiv Detail & Related papers (2021-12-11T15:09:19Z)
AdaPT-GMM: Powerful and robust covariate-assisted multiple testing [0.7614628596146599]
We propose a new empirical Bayes method for co-assisted multiple testing with false discovery rate (FDR) control. Our method refines the adaptive p-value thresholding (AdaPT) procedure by generalizing its masking scheme. We show in extensive simulations and real data examples that our new method, which we call AdaPT-GMM, consistently delivers high power.
arXiv Detail & Related papers (2021-06-30T05:06:18Z)
Directional FDR Control for Sub-Gaussian Sparse GLMs [4.229179009157074]
False discovery rate (FDR) control aims to identify some small number of statistically significantly nonzero results. We construct the debiased matrix-Lasso estimator and prove the normality by minimax-rate oracle inequalities for sparse GLMs.
arXiv Detail & Related papers (2021-05-02T05:34:32Z)
Precise High-Dimensional Asymptotics for Quantifying Heterogeneous Transfers [66.66228496844191]
We show when does combining the samples from two related tasks perform better than learning with one target task alone?<n>This question is motivated by an empirical phenomenon known as negative transfer, which has been observed in practice.<n>We illustrate these results in a random-effects model to mathematically prove a phase transition from positive to negative transfer as the number of source task samples increases.
arXiv Detail & Related papers (2020-10-22T14:14:20Z)
A Nonparametric Test of Dependence Based on Ensemble of Decision Trees [0.0]
The proposed coefficient is a permutation-like statistic that quantifies how much the observed sample S_n : (X_i, Y_i), i = 1. n is discriminable from the permutated sample S_nn : (X_i, Y_j), i, j = 1. n, where the two variables are independent.
arXiv Detail & Related papers (2020-07-24T02:48:33Z)
An Investigation of Why Overparameterization Exacerbates Spurious Correlations [98.3066727301239]
We identify two key properties of the training data that drive this behavior. We show how the inductive bias of models towards "memorizing" fewer examples can cause over parameterization to hurt.
arXiv Detail & Related papers (2020-05-09T01:59:13Z)
Discriminator Contrastive Divergence: Semi-Amortized Generative Modeling by Exploring Energy of the Discriminator [85.68825725223873]
Generative Adversarial Networks (GANs) have shown great promise in modeling high dimensional data. We introduce the Discriminator Contrastive Divergence, which is well motivated by the property of WGAN's discriminator. We demonstrate the benefits of significant improved generation on both synthetic data and several real-world image generation benchmarks.
arXiv Detail & Related papers (2020-04-05T01:50:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.