Related papers: To Collaborate or Not in Distributed Statistical Estimation with Resource Constraints?

To Collaborate or Not in Distributed Statistical Estimation with Resource Constraints?

URL: http://arxiv.org/abs/2206.00111v1
Date: Tue, 31 May 2022 20:47:09 GMT
Title: To Collaborate or Not in Distributed Statistical Estimation with Resource Constraints?
Authors: Yu-Zhen Janice Chen, Daniel S. Menasche, Don Towsley
Abstract summary: We study how the amount of correlation between observations collected by distinct sensors/learners affects data collection and collaboration strategies. We discuss two applications, IoT DDoS attack detection and distributed estimation in wireless sensor networks, that may benefit from our results.
Score: 14.626510386380474
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We study how the amount of correlation between observations collected by distinct sensors/learners affects data collection and collaboration strategies by analyzing Fisher information and the Cramer-Rao bound. In particular, we consider a simple setting wherein two sensors sample from a bivariate Gaussian distribution, which already motivates the adoption of various strategies, depending on the correlation between the two variables and resource constraints. We identify two particular scenarios: (1) where the knowledge of the correlation between samples cannot be leveraged for collaborative estimation purposes and (2) where the optimal data collection strategy involves investing scarce resources to collaboratively sample and transfer information that is not of immediate interest and whose statistics are already known, with the sole goal of increasing the confidence on an estimate of the parameter of interest. We discuss two applications, IoT DDoS attack detection and distributed estimation in wireless sensor networks, that may benefit from our results.

Related papers

Transfer learning for scalar-on-function regression via control variates [4.435940475554593]
Transfer learning (TL) has emerged as a powerful tool for improving estimation and prediction performance.<n>We propose a framework that relies exclusively on dataset-specific summary statistics.<n>We establish theoretical connections among several existing TL strategies and derive convergence rates for our CVS-based proposals.
arXiv Detail & Related papers (2026-01-23T23:02:57Z)
A Tale of Two Experts: Cooperative Learning for Source-Free Unsupervised Domain Adaptation [59.88864205383671]
Source-Free Unsupervised Domain Adaptation (SFUDA) addresses the realistic challenge of adapting a source-trained model to a target domain without access to the source data.<n>Existing SFUDA methods either exploit only the source model's predictions or fine-tune large multimodal models.<n>We propose the Experts Cooperative Learning (EXCL) to exploit complementary insights and the latent structure of target data.
arXiv Detail & Related papers (2025-09-26T11:39:50Z)
Reducing Spurious Correlation for Federated Domain Generalization [15.864230656989854]
In open-world scenarios, global models may struggle to predict well on entirely new domain data captured by certain media. Existing methods still rely on strong statistical correlations between samples and labels to address this issue. We introduce FedCD, an overall optimization framework at both the local and global levels.
arXiv Detail & Related papers (2024-07-27T05:06:31Z)
Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation [62.2436697657307]
Prediction-powered inference (PPI) is a method that improves statistical estimates based on limited human-labeled data. We propose a method called Stratified Prediction-Powered Inference (StratPPI) We show that the basic PPI estimates can be considerably improved by employing simple data stratification strategies.
arXiv Detail & Related papers (2024-06-06T17:37:39Z)
Distributed Event-Based Learning via ADMM [11.461617927469316]
We consider a distributed learning problem, where agents minimize a global objective function by exchanging information over a network. Our approach has two distinct features: (i) It substantially reduces communication by triggering communication only when necessary, and (ii) it is agnostic to the data-distribution among the different agents.
arXiv Detail & Related papers (2024-05-17T08:30:28Z)
DAGnosis: Localized Identification of Data Inconsistencies using Structures [73.39285449012255]
Identification and appropriate handling of inconsistencies in data at deployment time is crucial to reliably use machine learning models. We use directed acyclic graphs (DAGs) to encode the training set's features probability distribution and independencies as a structure. Our method, called DAGnosis, leverages these structural interactions to bring valuable and insightful data-centric conclusions.
arXiv Detail & Related papers (2024-02-26T11:29:16Z)
On Collaboration in Distributed Parameter Estimation with Resource Constraints [11.998903619502443]
Sensors or agents must optimize their resource allocation to maximize the accuracy of parameter estimation. We formulate a sensor or agent's data collection and collaboration policy design problem. We propose novel approaches that apply multi-armed bandit algorithms to learn the optimal data collection and collaboration policy.
arXiv Detail & Related papers (2023-07-12T20:11:50Z)
CEDAR: Communication Efficient Distributed Analysis for Regressions [9.50726756006467]
There are growing interests about distributed learning over multiple EHRs databases without sharing patient-level data. We propose a novel communication efficient method that aggregates the local optimal estimates, by turning the problem into a missing data problem. We provide theoretical investigation for the properties of the proposed method for statistical inference as well as differential privacy, and evaluate its performance in simulations and real data analyses.
arXiv Detail & Related papers (2022-07-01T09:53:44Z)
Causal Balancing for Domain Generalization [95.97046583437145]
We propose a balanced mini-batch sampling strategy to reduce the domain-specific spurious correlations in observed training distributions. We provide an identifiability guarantee of the source of spuriousness and show that our proposed approach provably samples from a balanced, spurious-free distribution.
arXiv Detail & Related papers (2022-06-10T17:59:11Z)
DRFLM: Distributionally Robust Federated Learning with Inter-client Noise via Local Mixup [58.894901088797376]
federated learning has emerged as a promising approach for training a global model using data from multiple organizations without leaking their raw data. We propose a general framework to solve the above two challenges simultaneously. We provide comprehensive theoretical analysis including robustness analysis, convergence analysis, and generalization ability.
arXiv Detail & Related papers (2022-04-16T08:08:29Z)
Learning Bias-Invariant Representation by Cross-Sample Mutual Information Minimization [77.8735802150511]
We propose a cross-sample adversarial debiasing (CSAD) method to remove the bias information misused by the target task. The correlation measurement plays a critical role in adversarial debiasing and is conducted by a cross-sample neural mutual information estimator. We conduct thorough experiments on publicly available datasets to validate the advantages of the proposed method over state-of-the-art approaches.
arXiv Detail & Related papers (2021-08-11T21:17:02Z)
Domain Adaptative Causality Encoder [52.779274858332656]
We leverage the characteristics of dependency trees and adversarial learning to address the tasks of adaptive causality identification and localisation. We present a new causality dataset, namely MedCaus, which integrates all types of causality in the text.
arXiv Detail & Related papers (2020-11-27T04:14:55Z)
Sharing Models or Coresets: A Study based on Membership Inference Attack [17.562474629669513]
Distributed machine learning aims at training a global model based on distributed data without collecting all the data to a centralized location. Two approaches have been proposed: collecting and aggregating local models (federated learning) and collecting and training over representative data summaries (coreset) Our experiments quantify the accuracy-privacy-cost tradeoff of each approach, and reveal a nontrivial comparison that can be used to guide the design of model training processes.
arXiv Detail & Related papers (2020-07-06T18:06:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.