A Survey for Federated Learning Evaluations: Goals and Measures
- URL: http://arxiv.org/abs/2308.11841v2
- Date: Sat, 23 Mar 2024 08:45:03 GMT
- Title: A Survey for Federated Learning Evaluations: Goals and Measures
- Authors: Di Chai, Leye Wang, Liu Yang, Junxue Zhang, Kai Chen, Qiang Yang,
- Abstract summary: Federated learning (FL) is a novel paradigm for privacy-preserving machine learning.
evaluating FL is challenging due to its interdisciplinary nature and diverse goals, such as utility, efficiency, and security.
We introduce FedEval, an open-source platform that provides a standardized and comprehensive evaluation framework for FL algorithms.
- Score: 26.120949005265345
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Evaluation is a systematic approach to assessing how well a system achieves its intended purpose. Federated learning (FL) is a novel paradigm for privacy-preserving machine learning that allows multiple parties to collaboratively train models without sharing sensitive data. However, evaluating FL is challenging due to its interdisciplinary nature and diverse goals, such as utility, efficiency, and security. In this survey, we first review the major evaluation goals adopted in the existing studies and then explore the evaluation metrics used for each goal. We also introduce FedEval, an open-source platform that provides a standardized and comprehensive evaluation framework for FL algorithms in terms of their utility, efficiency, and security. Finally, we discuss several challenges and future research directions for FL evaluation.
Related papers
- OpenUnlearning: Accelerating LLM Unlearning via Unified Benchmarking of Methods and Metrics [101.78963920333342]
We introduce OpenUnlearning, a standardized framework for benchmarking large language models (LLMs) unlearning methods and metrics.<n>OpenUnlearning integrates 9 unlearning algorithms and 16 diverse evaluations across 3 leading benchmarks.<n>We also benchmark diverse unlearning methods and provide a comparative analysis against an extensive evaluation suite.
arXiv Detail & Related papers (2025-06-14T20:16:37Z) - ATR-Bench: A Federated Learning Benchmark for Adaptation, Trust, and Reasoning [21.099779419619345]
We introduce a unified framework for analyzing federated learning through three foundational dimensions: Adaptation, Trust, and Reasoning.<n>ATR-Bench lays the groundwork for a systematic and holistic evaluation of federated learning with real-world relevance.
arXiv Detail & Related papers (2025-05-22T16:11:38Z) - Revisiting LLM Evaluation through Mechanism Interpretability: a New Metric and Model Utility Law [99.56567010306807]
Large Language Models (LLMs) have become indispensable across academia, industry, and daily applications.
We propose a novel metric, the Model Utilization Index (MUI), which introduces mechanism interpretability techniques to complement traditional performance metrics.
arXiv Detail & Related papers (2025-04-10T04:09:47Z) - A Survey on Federated Fine-tuning of Large Language Models [17.79395946441051]
Federated Learning (FL) offers a promising approach that enables collaborative model adaptation while ensuring data privacy.
We first trace the historical evolution of both Large Language Models (LLMs) and FL, while summarizing relevant prior surveys.
Following this, we conduct an extensive study of existing parameter-efficient fine-tuning (PEFT) methods and explore their applicability in FL.
Finally, we identify critical open challenges and outline promising research directions to drive future advancements in FedLLM.
arXiv Detail & Related papers (2025-03-15T06:52:10Z) - FEDLAD: Federated Evaluation of Deep Leakage Attacks and Defenses [50.921333548391345]
Federated Learning is a privacy preserving decentralized machine learning paradigm.
Recent research has revealed that private ground truth data can be recovered through a gradient technique known as Deep Leakage.
This paper introduces the FEDLAD Framework (Federated Evaluation of Deep Leakage Attacks and Defenses), a comprehensive benchmark for evaluating Deep Leakage attacks and defenses.
arXiv Detail & Related papers (2024-11-05T11:42:26Z) - Pessimistic Evaluation [58.736490198613154]
We argue that evaluating information access systems assumes utilitarian values not aligned with traditions of information access based on equal access.
We advocate for pessimistic evaluation of information access systems focusing on worst case utility.
arXiv Detail & Related papers (2024-10-17T15:40:09Z) - Benchmarking Data Heterogeneity Evaluation Approaches for Personalized Federated Learning [31.52293772126033]
The proposed benchmarking framework includes six representative approaches.
It is beneficial for keeping related research activities on the right track in terms of: (1) designing PFL schemes, (2) selecting appropriate data heterogeneity evaluation approaches for specific FL application scenarios, and (3) addressing fairness issues in collaborative model training.
arXiv Detail & Related papers (2024-10-09T13:16:02Z) - Towards Effective Evaluations and Comparisons for LLM Unlearning Methods [97.2995389188179]
This paper seeks to refine the evaluation of machine unlearning for large language models.
It addresses two key challenges -- the robustness of evaluation metrics and the trade-offs between competing goals.
arXiv Detail & Related papers (2024-06-13T14:41:00Z) - A Survey on Contribution Evaluation in Vertical Federated Learning [26.32678862011122]
Vertical Federated Learning (VFL) has emerged as a critical approach in machine learning to address privacy concerns.
This paper provides a review of contribution evaluation in VFL.
We explore various tasks in VFL that involving contribution evaluation and analyze their required evaluation properties.
arXiv Detail & Related papers (2024-05-03T06:32:07Z) - F-Eval: Assessing Fundamental Abilities with Refined Evaluation Methods [102.98899881389211]
We propose F-Eval, a bilingual evaluation benchmark to evaluate the fundamental abilities, including expression, commonsense and logic.
For reference-free subjective tasks, we devise new evaluation methods, serving as alternatives to scoring by API models.
arXiv Detail & Related papers (2024-01-26T13:55:32Z) - A Survey of Federated Unlearning: A Taxonomy, Challenges and Future
Directions [71.16718184611673]
The evolution of privacy-preserving Federated Learning (FL) has led to an increasing demand for implementing the right to be forgotten.
The implementation of selective forgetting is particularly challenging in FL due to its decentralized nature.
Federated Unlearning (FU) emerges as a strategic solution to address the increasing need for data privacy.
arXiv Detail & Related papers (2023-10-30T01:34:33Z) - A Survey of Federated Evaluation in Federated Learning [30.56651008584592]
In traditional machine learning, it is trivial to conduct model evaluation since all data samples are managed centrally by a server.
This is because clients do not expose their original data to preserve data privacy.
Federated evaluation plays a vital role in client selection, incentive mechanism design, malicious attack detection, etc.
arXiv Detail & Related papers (2023-05-14T04:55:13Z) - A Domain-Agnostic Approach for Characterization of Lifelong Learning
Systems [128.63953314853327]
"Lifelong Learning" systems are capable of 1) Continuous Learning, 2) Transfer and Adaptation, and 3) Scalability.
We show that this suite of metrics can inform the development of varied and complex Lifelong Learning systems.
arXiv Detail & Related papers (2023-01-18T21:58:54Z) - FedScale: Benchmarking Model and System Performance of Federated
Learning [4.1617240682257925]
FedScale is a set of challenging and realistic benchmark datasets for federated learning (FL) research.
FedScale is open-source with permissive licenses and actively maintained.
arXiv Detail & Related papers (2021-05-24T15:55:27Z) - Towards Automatic Evaluation of Dialog Systems: A Model-Free Off-Policy
Evaluation Approach [84.02388020258141]
We propose a new framework named ENIGMA for estimating human evaluation scores based on off-policy evaluation in reinforcement learning.
ENIGMA only requires a handful of pre-collected experience data, and therefore does not involve human interaction with the target policy during the evaluation.
Our experiments show that ENIGMA significantly outperforms existing methods in terms of correlation with human evaluation scores.
arXiv Detail & Related papers (2021-02-20T03:29:20Z) - FedEval: A Benchmark System with a Comprehensive Evaluation Model for
Federated Learning [17.680627081257246]
In this paper, we propose a comprehensive evaluation framework for federated learning (FL) systems.
We first introduce the ACTPR model, which defines five metrics that cannot be excluded in FL evaluation, including Accuracy, Communication, Time efficiency, Privacy, and Robustness.
We then provide an in-depth benchmarking study between the two most widely-used FL mechanisms, FedSGD and FedAvg.
arXiv Detail & Related papers (2020-11-19T04:59:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.