To Collaborate or Not in Distributed Statistical Estimation with
Resource Constraints?
- URL: http://arxiv.org/abs/2206.00111v1
- Date: Tue, 31 May 2022 20:47:09 GMT
- Title: To Collaborate or Not in Distributed Statistical Estimation with
Resource Constraints?
- Authors: Yu-Zhen Janice Chen, Daniel S. Menasche, Don Towsley
- Abstract summary: We study how the amount of correlation between observations collected by distinct sensors/learners affects data collection and collaboration strategies.
We discuss two applications, IoT DDoS attack detection and distributed estimation in wireless sensor networks, that may benefit from our results.
- Score: 14.626510386380474
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We study how the amount of correlation between observations collected by
distinct sensors/learners affects data collection and collaboration strategies
by analyzing Fisher information and the Cramer-Rao bound. In particular, we
consider a simple setting wherein two sensors sample from a bivariate Gaussian
distribution, which already motivates the adoption of various strategies,
depending on the correlation between the two variables and resource
constraints. We identify two particular scenarios: (1) where the knowledge of
the correlation between samples cannot be leveraged for collaborative
estimation purposes and (2) where the optimal data collection strategy involves
investing scarce resources to collaboratively sample and transfer information
that is not of immediate interest and whose statistics are already known, with
the sole goal of increasing the confidence on an estimate of the parameter of
interest. We discuss two applications, IoT DDoS attack detection and
distributed estimation in wireless sensor networks, that may benefit from our
results.
Related papers
- Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation [62.2436697657307]
Prediction-powered inference (PPI) is a method that improves statistical estimates based on limited human-labeled data.
We propose a method called Stratified Prediction-Powered Inference (StratPPI)
We show that the basic PPI estimates can be considerably improved by employing simple data stratification strategies.
arXiv Detail & Related papers (2024-06-06T17:37:39Z) - Distributed Event-Based Learning via ADMM [11.461617927469316]
We consider a distributed learning problem, where agents minimize a global objective function by exchanging information over a network.
Our approach has two distinct features: (i) It substantially reduces communication by triggering communication only when necessary, and (ii) it is agnostic to the data-distribution among the different agents.
arXiv Detail & Related papers (2024-05-17T08:30:28Z) - Multi-Source Conformal Inference Under Distribution Shift [41.701790856201036]
We consider the problem of obtaining distribution-free prediction intervals for a target population, leveraging multiple potentially biased data sources.
We derive the efficient influence functions for the quantiles of unobserved outcomes in the target and source populations.
We propose a data-adaptive strategy to upweight informative data sources for efficiency gain and downweight non-informative data sources for bias reduction.
arXiv Detail & Related papers (2024-05-15T13:33:09Z) - DAGnosis: Localized Identification of Data Inconsistencies using
Structures [73.39285449012255]
Identification and appropriate handling of inconsistencies in data at deployment time is crucial to reliably use machine learning models.
We use directed acyclic graphs (DAGs) to encode the training set's features probability distribution and independencies as a structure.
Our method, called DAGnosis, leverages these structural interactions to bring valuable and insightful data-centric conclusions.
arXiv Detail & Related papers (2024-02-26T11:29:16Z) - On Collaboration in Distributed Parameter Estimation with Resource
Constraints [13.014069919671623]
We study sensor/agent data collection and collaboration policies for parameter estimation.
We propose novel ways to apply multi-armed bandit algorithms to learn the optimal data collection and collaboration policy.
arXiv Detail & Related papers (2023-07-12T20:11:50Z) - CEDAR: Communication Efficient Distributed Analysis for Regressions [9.50726756006467]
There are growing interests about distributed learning over multiple EHRs databases without sharing patient-level data.
We propose a novel communication efficient method that aggregates the local optimal estimates, by turning the problem into a missing data problem.
We provide theoretical investigation for the properties of the proposed method for statistical inference as well as differential privacy, and evaluate its performance in simulations and real data analyses.
arXiv Detail & Related papers (2022-07-01T09:53:44Z) - Causal Balancing for Domain Generalization [95.97046583437145]
We propose a balanced mini-batch sampling strategy to reduce the domain-specific spurious correlations in observed training distributions.
We provide an identifiability guarantee of the source of spuriousness and show that our proposed approach provably samples from a balanced, spurious-free distribution.
arXiv Detail & Related papers (2022-06-10T17:59:11Z) - DRFLM: Distributionally Robust Federated Learning with Inter-client
Noise via Local Mixup [58.894901088797376]
federated learning has emerged as a promising approach for training a global model using data from multiple organizations without leaking their raw data.
We propose a general framework to solve the above two challenges simultaneously.
We provide comprehensive theoretical analysis including robustness analysis, convergence analysis, and generalization ability.
arXiv Detail & Related papers (2022-04-16T08:08:29Z) - Learning Bias-Invariant Representation by Cross-Sample Mutual
Information Minimization [77.8735802150511]
We propose a cross-sample adversarial debiasing (CSAD) method to remove the bias information misused by the target task.
The correlation measurement plays a critical role in adversarial debiasing and is conducted by a cross-sample neural mutual information estimator.
We conduct thorough experiments on publicly available datasets to validate the advantages of the proposed method over state-of-the-art approaches.
arXiv Detail & Related papers (2021-08-11T21:17:02Z) - Domain Adaptative Causality Encoder [52.779274858332656]
We leverage the characteristics of dependency trees and adversarial learning to address the tasks of adaptive causality identification and localisation.
We present a new causality dataset, namely MedCaus, which integrates all types of causality in the text.
arXiv Detail & Related papers (2020-11-27T04:14:55Z) - Sharing Models or Coresets: A Study based on Membership Inference Attack [17.562474629669513]
Distributed machine learning aims at training a global model based on distributed data without collecting all the data to a centralized location.
Two approaches have been proposed: collecting and aggregating local models (federated learning) and collecting and training over representative data summaries (coreset)
Our experiments quantify the accuracy-privacy-cost tradeoff of each approach, and reveal a nontrivial comparison that can be used to guide the design of model training processes.
arXiv Detail & Related papers (2020-07-06T18:06:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.