CATCHFed: Efficient Unlabeled Data Utilization for Semi-Supervised Federated Learning in Limited Labels Environments
- URL: http://arxiv.org/abs/2511.11778v2
- Date: Wed, 19 Nov 2025 04:17:29 GMT
- Title: CATCHFed: Efficient Unlabeled Data Utilization for Semi-Supervised Federated Learning in Limited Labels Environments
- Authors: Byoungjun Park, Pedro Porto Buarque de Gusmão, Dongjin Ji, Minhoe Kim,
- Abstract summary: Federated learning is a promising paradigm that utilizes distributed client resources while preserving data privacy.<n>Most existing FL approaches assume clients possess labeled data, however, in real-world scenarios, client-side labels are often unavailable.<n>We propose textitCATCHFed, which introduces client-aware adaptive thresholds considering class difficulty, hybrid thresholds to enhance pseudo-label quality, and utilizes unpseudo-labeled data for consistency regularization.
- Score: 3.2805451821208234
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Federated learning is a promising paradigm that utilizes distributed client resources while preserving data privacy. Most existing FL approaches assume clients possess labeled data, however, in real-world scenarios, client-side labels are often unavailable. Semi-supervised Federated learning, where only the server holds labeled data, addresses this issue. However, it experiences significant performance degradation as the number of labeled data decreases. To tackle this problem, we propose \textit{CATCHFed}, which introduces client-aware adaptive thresholds considering class difficulty, hybrid thresholds to enhance pseudo-label quality, and utilizes unpseudo-labeled data for consistency regularization. Extensive experiments across various datasets and configurations demonstrate that CATCHFed effectively leverages unlabeled client data, achieving superior performance even in extremely limited-label settings.
Related papers
- Robust Semi-Supervised Learning in Open Environments [51.741549825533816]
Semi-supervised learning (SSL) aims to improve performance by exploiting unlabeled data when labels are scarce.<n>It has been reported that exploiting inconsistent unlabeled data causes severe performance degradation.<n>This paper briefly introduces some advances in this line of research, focusing on techniques concerning label, feature, and data distribution inconsistency in SSL.
arXiv Detail & Related papers (2024-12-24T08:13:01Z) - Boosting Federated Learning with FedEntOpt: Mitigating Label Skew by Entropy-Based Client Selection [13.851391819710367]
Deep learning domains typically require an extensive amount of data for optimal performance.<n>FedEntOpt is designed to mitigate performance issues caused by label distribution skew.<n>It exhibits robust and superior performance in scenarios with low participation rates and client dropout.
arXiv Detail & Related papers (2024-11-02T13:31:36Z) - (FL)$^2$: Overcoming Few Labels in Federated Semi-Supervised Learning [4.803231218533992]
Federated Learning (FL) is a distributed machine learning framework that trains accurate global models while preserving clients' privacy-sensitive data.
Most FL approaches assume that clients possess labeled data, which is often not the case in practice.
We propose $(FL)2$, a robust training method for unlabeled clients using sharpness-aware consistency regularization.
arXiv Detail & Related papers (2024-10-30T17:15:02Z) - Federated Learning with Only Positive Labels by Exploring Label Correlations [78.59613150221597]
Federated learning aims to collaboratively learn a model by using the data from multiple users under privacy constraints.
In this paper, we study the multi-label classification problem under the federated learning setting.
We propose a novel and generic method termed Federated Averaging by exploring Label Correlations (FedALC)
arXiv Detail & Related papers (2024-04-24T02:22:50Z) - FedAnchor: Enhancing Federated Semi-Supervised Learning with Label
Contrastive Loss for Unlabeled Clients [19.3885479917635]
Federated learning (FL) is a distributed learning paradigm that facilitates collaborative training of a shared global model across devices.
We propose FedAnchor, an innovative FSSL method that introduces a unique double-head structure, called anchor head, paired with the classification head trained exclusively on labeled anchor data on the server.
Our approach mitigates the confirmation bias and overfitting issues associated with pseudo-labeling techniques based on high-confidence model prediction samples.
arXiv Detail & Related papers (2024-02-15T18:48:21Z) - Soft Curriculum for Learning Conditional GANs with Noisy-Labeled and
Uncurated Unlabeled Data [70.25049762295193]
We introduce a novel conditional image generation framework that accepts noisy-labeled and uncurated data during training.
We propose soft curriculum learning, which assigns instance-wise weights for adversarial training while assigning new labels for unlabeled data.
Our experiments show that our approach outperforms existing semi-supervised and label-noise robust methods in terms of both quantitative and qualitative performance.
arXiv Detail & Related papers (2023-07-17T08:31:59Z) - FedIL: Federated Incremental Learning from Decentralized Unlabeled Data
with Convergence Analysis [23.70951896315126]
This work considers the server with a small labeled dataset and intends to use unlabeled data in multiple clients for semi-supervised learning.
We propose a new framework with a generalized model, Federated Incremental Learning (FedIL), to address the problem of how to utilize labeled data in the server and unlabeled data in clients separately.
arXiv Detail & Related papers (2023-02-23T07:12:12Z) - Uncertainty Minimization for Personalized Federated Semi-Supervised
Learning [15.123493340717303]
We propose a novel semi-supervised learning paradigm which allows partial-labeled or unlabeled clients to seek labeling assistance from data-related clients (helper agents)
Experiments show that our proposed method can obtain superior performance and more stable convergence than other related works with partial labeled data.
arXiv Detail & Related papers (2022-05-05T04:41:27Z) - Self-Tuning for Data-Efficient Deep Learning [75.34320911480008]
Self-Tuning is a novel approach to enable data-efficient deep learning.
It unifies the exploration of labeled and unlabeled data and the transfer of a pre-trained model.
It outperforms its SSL and TL counterparts on five tasks by sharp margins.
arXiv Detail & Related papers (2021-02-25T14:56:19Z) - ORDisCo: Effective and Efficient Usage of Incremental Unlabeled Data for
Semi-supervised Continual Learning [52.831894583501395]
Continual learning assumes the incoming data are fully labeled, which might not be applicable in real applications.
We propose deep Online Replay with Discriminator Consistency (ORDisCo) to interdependently learn a classifier with a conditional generative adversarial network (GAN)
We show ORDisCo achieves significant performance improvement on various semi-supervised learning benchmark datasets for SSCL.
arXiv Detail & Related papers (2021-01-02T09:04:14Z) - Federated Semi-Supervised Learning with Inter-Client Consistency &
Disjoint Learning [78.88007892742438]
We study two essential scenarios of Federated Semi-Supervised Learning (FSSL) based on the location of the labeled data.
We propose a novel method to tackle the problems, which we refer to as Federated Matching (FedMatch)
arXiv Detail & Related papers (2020-06-22T09:43:41Z) - On Leveraging Unlabeled Data for Concurrent Positive-Unlabeled Classification and Robust Generation [72.062661402124]
We present a novel training framework to jointly target PU classification and conditional generation when exposed to extra data.<n>We prove the optimal condition of CNI-CGAN and experimentally, we conducted extensive evaluations on diverse datasets.
arXiv Detail & Related papers (2020-06-14T08:27:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.