Leveraging Optimal Transport for Distributed Two-Sample Testing: An Integrated Transportation Distance-based Framework
- URL: http://arxiv.org/abs/2506.16047v1
- Date: Thu, 19 Jun 2025 05:59:48 GMT
- Title: Leveraging Optimal Transport for Distributed Two-Sample Testing: An Integrated Transportation Distance-based Framework
- Authors: Zhengqi Lin, Yan Chen,
- Abstract summary: This paper introduces a novel framework for distributed two-sample testing using the Integrated Transportation Distance (ITD)<n>A permutation test procedure is proposed for implementation in distributed settings, allowing for efficient computation while preserving data privacy.<n>Results indicate that ITD effectively aggregates information across distributed clients, detecting subtle distributional shifts that might be missed when examining individual clients.
- Score: 2.65043787671605
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper introduces a novel framework for distributed two-sample testing using the Integrated Transportation Distance (ITD), an extension of the Optimal Transport distance. The approach addresses the challenges of detecting distributional changes in decentralized learning or federated learning environments, where data privacy and heterogeneity are significant concerns. We provide theoretical foundations for the ITD, including convergence properties and asymptotic behavior. A permutation test procedure is proposed for practical implementation in distributed settings, allowing for efficient computation while preserving data privacy. The framework's performance is demonstrated through theoretical power analysis and extensive simulations, showing robust Type I error control and high power across various distributions and dimensions. The results indicate that ITD effectively aggregates information across distributed clients, detecting subtle distributional shifts that might be missed when examining individual clients. This work contributes to the growing field of distributed statistical inference, offering a powerful tool for two-sample testing in modern, decentralized data environments.
Related papers
- Statistical Inference for Conditional Group Distributionally Robust Optimization with Cross-Entropy Loss [9.054486124506521]
We study multi-source unsupervised domain adaptation, where labeled data are drawn from multiple source domains and only unlabeled data from a target domain.<n>We propose a novel Conditional Conditional Optimization (CG-DRO) framework that learns a classifier by minimizing the worst-case cross-entropy loss over the convex combinations of the conditional outcome distributions from the sources.<n>We establish fast statistical convergence rates for the estimator by constructing two surrogate minimax optimization problems that serve as theoretical bridges.
arXiv Detail & Related papers (2025-07-14T04:21:23Z) - Domain Adaptation and Entanglement: an Optimal Transport Perspective [86.24617989187988]
Current machine learning systems are brittle in the face of distribution shifts (DS), where the target distribution that the system is tested on differs from the source distribution used to train the system.<n>For deep neural networks, a popular framework for unsupervised domain adaptation (UDA) is domain matching, in which algorithms try to align the marginal distributions in the feature or output space.<n>In this paper, we derive new bounds based on optimal transport that analyze the UDA problem.
arXiv Detail & Related papers (2025-03-11T08:10:03Z) - Decentralizing Test-time Adaptation under Heterogeneous Data Streams [21.40129321379529]
Test-Time Adaptation (TTA) has shown promise in addressing distribution shifts between training and testing data.<n>Previous attempts merely stabilize model fine-tuning over time to handle continually changing environments.<n>This paper delves into TTA under heterogeneous data streams, moving beyond current model-centric limitations.
arXiv Detail & Related papers (2024-11-16T12:29:59Z) - Distributionally Robust Safe Sample Elimination under Covariate Shift [16.85444622474742]
We consider a machine learning setup where one training dataset is used to train multiple models across slightly different data distributions.
We propose the DRSSS method, which combines distributionally robust (DR) optimization and safe sample screening (SSS)
The key benefit of this method is that models trained on the reduced dataset will perform the same as those trained on the full dataset for all possible different environments.
arXiv Detail & Related papers (2024-06-10T01:46:42Z) - Optimal Aggregation of Prediction Intervals under Unsupervised Domain Shift [9.387706860375461]
A distribution shift occurs when the underlying data-generating process changes, leading to a deviation in the model's performance.
The prediction interval serves as a crucial tool for characterizing uncertainties induced by their underlying distribution.
We propose methodologies for aggregating prediction intervals to obtain one with minimal width and adequate coverage on the target domain.
arXiv Detail & Related papers (2024-05-16T17:55:42Z) - Distributed Markov Chain Monte Carlo Sampling based on the Alternating
Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers.
We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art.
In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z) - pSTarC: Pseudo Source Guided Target Clustering for Fully Test-Time
Adaptation [15.621092104244003]
Test Time Adaptation (TTA) is a pivotal concept in machine learning, enabling models to perform well in real-world scenarios.
We propose a novel approach called pseudo Source guided Target Clustering (pSTarC) addressing the relatively unexplored area of TTA under real-world domain shifts.
arXiv Detail & Related papers (2023-09-02T07:13:47Z) - Distribution Shift Inversion for Out-of-Distribution Prediction [57.22301285120695]
We propose a portable Distribution Shift Inversion algorithm for Out-of-Distribution (OoD) prediction.
We show that our method provides a general performance gain when plugged into a wide range of commonly used OoD algorithms.
arXiv Detail & Related papers (2023-06-14T08:00:49Z) - An Optimal Transport Approach to Personalized Federated Learning [41.27887358989414]
Federated learning aims to train a model using the local data of many distributed clients.
A key challenge in federated learning is that the data samples across the clients may not be identically distributed.
We propose a novel personalized Federated Learning scheme based on Optimal Transport (FedOT)
arXiv Detail & Related papers (2022-06-06T10:05:49Z) - DRFLM: Distributionally Robust Federated Learning with Inter-client
Noise via Local Mixup [58.894901088797376]
federated learning has emerged as a promising approach for training a global model using data from multiple organizations without leaking their raw data.
We propose a general framework to solve the above two challenges simultaneously.
We provide comprehensive theoretical analysis including robustness analysis, convergence analysis, and generalization ability.
arXiv Detail & Related papers (2022-04-16T08:08:29Z) - Investigating Shifts in GAN Output-Distributions [5.076419064097734]
We introduce a loop-training scheme for the systematic investigation of observable shifts between the distributions of real training data and GAN generated data.
Overall, the combination of these methods allows an explorative investigation of innate limitations of current GAN algorithms.
arXiv Detail & Related papers (2021-12-28T09:16:55Z) - Semi-supervised Domain Adaptive Structure Learning [72.01544419893628]
Semi-supervised domain adaptation (SSDA) is a challenging problem requiring methods to overcome both 1) overfitting towards poorly annotated data and 2) distribution shift across domains.
We introduce an adaptive structure learning method to regularize the cooperation of SSL and DA.
arXiv Detail & Related papers (2021-12-12T06:11:16Z) - Learning while Respecting Privacy and Robustness to Distributional
Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model.
The objective is to endow the trained model with robustness against adversarially manipulated input data.
Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.