Data-Free Evaluation of User Contributions in Federated Learning
- URL: http://arxiv.org/abs/2108.10623v1
- Date: Tue, 24 Aug 2021 10:17:03 GMT
- Title: Data-Free Evaluation of User Contributions in Federated Learning
- Authors: Hongtao Lv, Zhenzhe Zheng, Tie Luo, Fan Wu, Shaojie Tang, Lifeng Hua,
Rongfei Jia, Chengfei Lv
- Abstract summary: Federated learning (FL) trains a machine learning model on mobile devices in a distributed manner using each device's private data and computing resources.
We propose a method called Pairwise Correlated Agreement (PCA) based on the idea of peer prediction to evaluate user contribution in FL without a test dataset.
We then apply PCA to designing (1) a new federated learning algorithm called Fed-PCA, and (2) a new incentive mechanism that guarantees truthfulness.
- Score: 31.181141140071592
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Federated learning (FL) trains a machine learning model on mobile devices in
a distributed manner using each device's private data and computing resources.
A critical issues is to evaluate individual users' contributions so that (1)
users' effort in model training can be compensated with proper incentives and
(2) malicious and low-quality users can be detected and removed. The
state-of-the-art solutions require a representative test dataset for the
evaluation purpose, but such a dataset is often unavailable and hard to
synthesize. In this paper, we propose a method called Pairwise Correlated
Agreement (PCA) based on the idea of peer prediction to evaluate user
contribution in FL without a test dataset. PCA achieves this using the
statistical correlation of the model parameters uploaded by users. We then
apply PCA to designing (1) a new federated learning algorithm called Fed-PCA,
and (2) a new incentive mechanism that guarantees truthfulness. We evaluate the
performance of PCA and Fed-PCA using the MNIST dataset and a large industrial
product recommendation dataset. The results demonstrate that our Fed-PCA
outperforms the canonical FedAvg algorithm and other baseline methods in
accuracy, and at the same time, PCA effectively incentivizes users to behave
truthfully.
Related papers
- Privacy-Preserved Automated Scoring using Federated Learning for Educational Research [1.2556373621040728]
This study proposes a federated learning framework for automatic scoring in educational assessments.
Student responses are processed locally on edge devices, and only optimized model parameters are shared with a central aggregation server.
We evaluate our framework using assessment data from nine middle schools, comparing the accuracy of federated learning-based scoring models with traditionally trained centralized models.
arXiv Detail & Related papers (2025-03-12T19:06:25Z) - DUPRE: Data Utility Prediction for Efficient Data Valuation [49.60564885180563]
Cooperative game theory-based data valuation, such as Data Shapley, requires evaluating the data utility and retraining the ML model for multiple data subsets.
Our framework, textttDUPRE, takes an alternative yet complementary approach that reduces the cost per subset evaluation by predicting data utilities instead of evaluating them by model retraining.
Specifically, given the evaluated data utilities of some data subsets, textttDUPRE fits a emphGaussian process (GP) regression model to predict the utility of every other data subset.
arXiv Detail & Related papers (2025-02-22T08:53:39Z) - Efficient Multi-Agent System Training with Data Influence-Oriented Tree Search [59.75749613951193]
We propose Data Influence-oriented Tree Search (DITS) to guide both tree search and data selection.
By leveraging influence scores, we effectively identify the most impactful data for system improvement.
We derive influence score estimation methods tailored for non-differentiable metrics.
arXiv Detail & Related papers (2025-02-02T23:20:16Z) - Federated Testing (FedTest): A New Scheme to Enhance Convergence and Mitigate Adversarial Attacks in Federating Learning [35.14491996649841]
We introduce a novel federated learning framework, which we call federated testing for federated learning (FedTest)
In FedTest, the local data of a specific user is used to train the model of that user and test the models of the other users.
Our numerical results reveal that the proposed method not only accelerates convergence rates but also diminishes the potential influence of malicious users.
arXiv Detail & Related papers (2025-01-19T21:01:13Z) - Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation [62.2436697657307]
Prediction-powered inference (PPI) is a method that improves statistical estimates based on limited human-labeled data.
We propose a method called Stratified Prediction-Powered Inference (StratPPI)
We show that the basic PPI estimates can be considerably improved by employing simple data stratification strategies.
arXiv Detail & Related papers (2024-06-06T17:37:39Z) - Bayesian Prediction-Powered Inference [62.2436697657307]
Prediction-powered inference (PPI) is a method that improves statistical estimates based on limited human-labeled data.
We propose a framework for PPI based on Bayesian inference that allows researchers to develop new task-appropriate PPI methods easily.
arXiv Detail & Related papers (2024-05-09T18:08:58Z) - Fair Streaming Principal Component Analysis: Statistical and Algorithmic
Viewpoint [38.86637435197192]
We present a new theoretical and practical approach to fair Principal Component Analysis (PCA) using a new notion called emphprobably approximately fair and optimal (PAFO) learnability.
We then provide its it statistical guarantee in terms of PAFO-learnability, which is the first of its kind in fair PCA literature.
Lastly, we verify the efficacy and memory efficiency of our algorithm on real-world datasets.
arXiv Detail & Related papers (2023-10-28T05:09:30Z) - Sample Complexity of Preference-Based Nonparametric Off-Policy
Evaluation with Deep Networks [58.469818546042696]
We study the sample efficiency of OPE with human preference and establish a statistical guarantee for it.
By appropriately selecting the size of a ReLU network, we show that one can leverage any low-dimensional manifold structure in the Markov decision process.
arXiv Detail & Related papers (2023-10-16T16:27:06Z) - Uncertainty-Aware Instance Reweighting for Off-Policy Learning [63.31923483172859]
We propose a Uncertainty-aware Inverse Propensity Score estimator (UIPS) for improved off-policy learning.
Experiment results on synthetic and three real-world recommendation datasets demonstrate the advantageous sample efficiency of the proposed UIPS estimator.
arXiv Detail & Related papers (2023-03-11T11:42:26Z) - Efficient fair PCA for fair representation learning [21.990310743597174]
We propose a conceptually simple approach that allows for an analytic solution similar to standard PCA and can be kernelized.
Our methods have the same complexity as standard PCA, or kernel PCA, and run much faster than existing methods for fair PCA based on semidefinite programming or manifold optimization.
arXiv Detail & Related papers (2023-02-26T13:34:43Z) - D-BIAS: A Causality-Based Human-in-the-Loop System for Tackling
Algorithmic Bias [57.87117733071416]
We propose D-BIAS, a visual interactive tool that embodies human-in-the-loop AI approach for auditing and mitigating social biases.
A user can detect the presence of bias against a group by identifying unfair causal relationships in the causal network.
For each interaction, say weakening/deleting a biased causal edge, the system uses a novel method to simulate a new (debiased) dataset.
arXiv Detail & Related papers (2022-08-10T03:41:48Z) - FAST-PCA: A Fast and Exact Algorithm for Distributed Principal Component
Analysis [12.91948651812873]
Principal Component Analysis (PCA) is a fundamental data preprocessing tool in the world of machine learning.
This paper proposes a distributed PCA algorithm called FAST-PCA (Fast and exAct diSTributed PCA)
arXiv Detail & Related papers (2021-08-27T16:10:59Z) - Federated Robustness Propagation: Sharing Adversarial Robustness in
Federated Learning [98.05061014090913]
Federated learning (FL) emerges as a popular distributed learning schema that learns from a set of participating users without requiring raw data to be shared.
adversarial training (AT) provides a sound solution for centralized learning, extending its usage for FL users has imposed significant challenges.
We show that existing FL techniques cannot effectively propagate adversarial robustness among non-iid users.
We propose a simple yet effective propagation approach that transfers robustness through carefully designed batch-normalization statistics.
arXiv Detail & Related papers (2021-06-18T15:52:33Z) - Estimation of Individual Device Contributions for Incentivizing
Federated Learning [8.426678774799859]
Federated learning (FL) is an emerging technique used to train a machine-learning model collaboratively using the data and computation resource of mobile devices.
This paper proposes a computation-and communication-efficient method of estimating a participating device's contribution level.
arXiv Detail & Related papers (2020-09-20T07:03:27Z) - Privacy Preserving PCA for Multiparty Modeling [21.33430578478244]
PPPCA can accomplish multiparty cooperative execution of PCA under the premise of keeping plaintext data locally.
The output of PPPCA can be sent directly to data consumer to build any machine learning models.
Results show that the accuracy of the model built upon PPPCA is the same as the model with PCA that is built based on centralized data.
arXiv Detail & Related papers (2020-02-06T04:16:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.