Federated Computation of ROC and PR Curves
- URL: http://arxiv.org/abs/2510.04979v1
- Date: Mon, 06 Oct 2025 16:16:46 GMT
- Title: Federated Computation of ROC and PR Curves
- Authors: Xuefeng Xu, Graham Cormode,
- Abstract summary: In Federated Learning (FL) scenarios, where data is distributed across multiple clients, computing ROC and PR curves is challenging due to privacy and communication constraints.<n>We propose a novel method for approximating ROC and PR curves in a federated setting by estimating quantiles of the prediction score distribution under distributed differential privacy.
- Score: 8.64427265159929
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Receiver Operating Characteristic (ROC) and Precision-Recall (PR) curves are fundamental tools for evaluating machine learning classifiers, offering detailed insights into the trade-offs between true positive rate vs. false positive rate (ROC) or precision vs. recall (PR). However, in Federated Learning (FL) scenarios, where data is distributed across multiple clients, computing these curves is challenging due to privacy and communication constraints. Specifically, the server cannot access raw prediction scores and class labels, which are used to compute the ROC and PR curves in a centralized setting. In this paper, we propose a novel method for approximating ROC and PR curves in a federated setting by estimating quantiles of the prediction score distribution under distributed differential privacy. We provide theoretical bounds on the Area Error (AE) between the true and estimated curves, demonstrating the trade-offs between approximation accuracy, privacy, and communication cost. Empirical results on real-world datasets demonstrate that our method achieves high approximation accuracy with minimal communication and strong privacy guarantees, making it practical for privacy-preserving model evaluation in federated systems.
Related papers
- The Statistical Fairness-Accuracy Frontier [50.323024516295725]
Machine learning models must balance accuracy and fairness, but these goals often conflict.<n>A useful tool for understanding this trade-off is the fairness-accuracy frontier, which characterizes the set of models that cannot be simultaneously improved in both fairness and accuracy.<n>We study the FA frontier in the finite-sample regime, showing how it deviates from its population counterpart and quantifying the worst-case gap between them.
arXiv Detail & Related papers (2025-08-25T03:01:35Z) - Federated PCA and Estimation for Spiked Covariance Matrices: Optimal Rates and Efficient Algorithm [19.673557166734977]
Federated Learning (FL) has gained significant recent attention in machine learning for its enhanced privacy and data security.
This paper investigates federated PCA and estimation for spiked covariance matrices under distributed differential privacy constraints.
We establish minimax rates of convergence, with a key finding that the central server's optimal rate is the harmonic mean of the local clients' minimax rates.
arXiv Detail & Related papers (2024-11-23T21:57:50Z) - Conditional Prediction ROC Bands for Graph Classification [14.222892103838165]
Prediction ROC (CP-ROC) bands offer uncertainty quantification for ROC curves and robustness to distributional shifts in test data.
We establish statistically guaranteed coverage for CP-ROC under a local exchangeability condition.
This addresses uncertainty challenges for ROC curves under non-iid setting, ensuring reliability when test graph distributions differ from training data.
arXiv Detail & Related papers (2024-10-20T00:44:59Z) - FedCert: Federated Accuracy Certification [8.34167718121698]
Federated Learning (FL) has emerged as a powerful paradigm for training machine learning models in a decentralized manner.
Previous studies have assessed the effectiveness of models in centralized training based on certified accuracy.
This study proposes a method named FedCert to take the first step toward evaluating the robustness of FL systems.
arXiv Detail & Related papers (2024-10-04T01:19:09Z) - Towards Robust and Interpretable EMG-based Hand Gesture Recognition using Deep Metric Meta Learning [37.21211404608413]
We propose a shift to deep metric-based meta-learning in EMG PR to supervise the creation of meaningful and interpretable representations.
We derive a robust class proximity-based confidence estimator that leads to a better rejection of incorrect decisions.
arXiv Detail & Related papers (2024-04-17T23:37:50Z) - Learning for Transductive Threshold Calibration in Open-World Recognition [83.35320675679122]
We introduce OpenGCN, a Graph Neural Network-based transductive threshold calibration method with enhanced robustness and adaptability.
Experiments across open-world visual recognition benchmarks validate OpenGCN's superiority over existing posthoc calibration methods for open-world threshold calibration.
arXiv Detail & Related papers (2023-05-19T23:52:48Z) - Personalized Federated Learning under Mixture of Distributions [98.25444470990107]
We propose a novel approach to Personalized Federated Learning (PFL), which utilizes Gaussian mixture models (GMM) to fit the input data distributions across diverse clients.
FedGMM possesses an additional advantage of adapting to new clients with minimal overhead, and it also enables uncertainty quantification.
Empirical evaluations on synthetic and benchmark datasets demonstrate the superior performance of our method in both PFL classification and novel sample detection.
arXiv Detail & Related papers (2023-05-01T20:04:46Z) - Evaluating Probabilistic Classifiers: The Triptych [62.997667081978825]
We propose and study a triptych of diagnostic graphics that focus on distinct and complementary aspects of forecast performance.
The reliability diagram addresses calibration, the receiver operating characteristic (ROC) curve diagnoses discrimination ability, and the Murphy diagram visualizes overall predictive performance and value.
arXiv Detail & Related papers (2023-01-25T19:35:23Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Scalable Marginal Likelihood Estimation for Model Selection in Deep
Learning [78.83598532168256]
Marginal-likelihood based model-selection is rarely used in deep learning due to estimation difficulties.
Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable.
arXiv Detail & Related papers (2021-04-11T09:50:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.