A Survey for Federated Learning Evaluations: Goals and Measures
- URL: http://arxiv.org/abs/2308.11841v2
- Date: Sat, 23 Mar 2024 08:45:03 GMT
- Title: A Survey for Federated Learning Evaluations: Goals and Measures
- Authors: Di Chai, Leye Wang, Liu Yang, Junxue Zhang, Kai Chen, Qiang Yang,
- Abstract summary: Federated learning (FL) is a novel paradigm for privacy-preserving machine learning.
evaluating FL is challenging due to its interdisciplinary nature and diverse goals, such as utility, efficiency, and security.
We introduce FedEval, an open-source platform that provides a standardized and comprehensive evaluation framework for FL algorithms.
- Score: 26.120949005265345
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Evaluation is a systematic approach to assessing how well a system achieves its intended purpose. Federated learning (FL) is a novel paradigm for privacy-preserving machine learning that allows multiple parties to collaboratively train models without sharing sensitive data. However, evaluating FL is challenging due to its interdisciplinary nature and diverse goals, such as utility, efficiency, and security. In this survey, we first review the major evaluation goals adopted in the existing studies and then explore the evaluation metrics used for each goal. We also introduce FedEval, an open-source platform that provides a standardized and comprehensive evaluation framework for FL algorithms in terms of their utility, efficiency, and security. Finally, we discuss several challenges and future research directions for FL evaluation.
Related papers
- A Survey on Contribution Evaluation in Vertical Federated Learning [26.32678862011122]
Vertical Federated Learning (VFL) has emerged as a critical approach in machine learning to address privacy concerns.
This paper provides a review of contribution evaluation in VFL.
We explore various tasks in VFL that involving contribution evaluation and analyze their required evaluation properties.
arXiv Detail & Related papers (2024-05-03T06:32:07Z) - Overcoming Pitfalls in Graph Contrastive Learning Evaluation: Toward
Comprehensive Benchmarks [60.82579717007963]
We introduce an enhanced evaluation framework designed to more accurately gauge the effectiveness, consistency, and overall capability of Graph Contrastive Learning (GCL) methods.
arXiv Detail & Related papers (2024-02-24T01:47:56Z) - F-Eval: Asssessing Fundamental Abilities with Refined Evaluation Methods [111.46455901113976]
We propose F-Eval, a bilingual evaluation benchmark to evaluate the fundamental abilities, including expression, commonsense and logic.
For reference-free subjective tasks, we devise new evaluation methods, serving as alternatives to scoring by API models.
arXiv Detail & Related papers (2024-01-26T13:55:32Z) - A Survey of Federated Unlearning: A Taxonomy, Challenges and Future
Directions [71.16718184611673]
The evolution of privacy-preserving Federated Learning (FL) has led to an increasing demand for implementing the right to be forgotten.
The implementation of selective forgetting is particularly challenging in FL due to its decentralized nature.
Federated Unlearning (FU) emerges as a strategic solution to address the increasing need for data privacy.
arXiv Detail & Related papers (2023-10-30T01:34:33Z) - Trustworthy Federated Learning: A Survey [0.5089078998562185]
Federated Learning (FL) has emerged as a significant advancement in the field of Artificial Intelligence (AI)
We provide an extensive overview of the current state of Trustworthy FL, exploring existing solutions and well-defined pillars relevant to Trustworthy.
We propose a taxonomy that encompasses three main pillars: Interpretability, Fairness, and Security & Privacy.
arXiv Detail & Related papers (2023-05-19T09:11:26Z) - A Survey of Federated Evaluation in Federated Learning [30.56651008584592]
In traditional machine learning, it is trivial to conduct model evaluation since all data samples are managed centrally by a server.
This is because clients do not expose their original data to preserve data privacy.
Federated evaluation plays a vital role in client selection, incentive mechanism design, malicious attack detection, etc.
arXiv Detail & Related papers (2023-05-14T04:55:13Z) - A Domain-Agnostic Approach for Characterization of Lifelong Learning
Systems [128.63953314853327]
"Lifelong Learning" systems are capable of 1) Continuous Learning, 2) Transfer and Adaptation, and 3) Scalability.
We show that this suite of metrics can inform the development of varied and complex Lifelong Learning systems.
arXiv Detail & Related papers (2023-01-18T21:58:54Z) - FedScale: Benchmarking Model and System Performance of Federated
Learning [4.1617240682257925]
FedScale is a set of challenging and realistic benchmark datasets for federated learning (FL) research.
FedScale is open-source with permissive licenses and actively maintained.
arXiv Detail & Related papers (2021-05-24T15:55:27Z) - Towards Automatic Evaluation of Dialog Systems: A Model-Free Off-Policy
Evaluation Approach [84.02388020258141]
We propose a new framework named ENIGMA for estimating human evaluation scores based on off-policy evaluation in reinforcement learning.
ENIGMA only requires a handful of pre-collected experience data, and therefore does not involve human interaction with the target policy during the evaluation.
Our experiments show that ENIGMA significantly outperforms existing methods in terms of correlation with human evaluation scores.
arXiv Detail & Related papers (2021-02-20T03:29:20Z) - FedEval: A Benchmark System with a Comprehensive Evaluation Model for
Federated Learning [17.680627081257246]
In this paper, we propose a comprehensive evaluation framework for federated learning (FL) systems.
We first introduce the ACTPR model, which defines five metrics that cannot be excluded in FL evaluation, including Accuracy, Communication, Time efficiency, Privacy, and Robustness.
We then provide an in-depth benchmarking study between the two most widely-used FL mechanisms, FedSGD and FedAvg.
arXiv Detail & Related papers (2020-11-19T04:59:51Z) - Interpretable Off-Policy Evaluation in Reinforcement Learning by
Highlighting Influential Transitions [48.91284724066349]
Off-policy evaluation in reinforcement learning offers the chance of using observational data to improve future outcomes in domains such as healthcare and education.
Traditional measures such as confidence intervals may be insufficient due to noise, limited data and confounding.
We develop a method that could serve as a hybrid human-AI system, to enable human experts to analyze the validity of policy evaluation estimates.
arXiv Detail & Related papers (2020-02-10T00:26:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.