Federated Semi-Supervised Learning with Class Distribution Mismatch
- URL: http://arxiv.org/abs/2111.00010v1
- Date: Fri, 29 Oct 2021 14:18:20 GMT
- Title: Federated Semi-Supervised Learning with Class Distribution Mismatch
- Authors: Zhiguo Wang, Xintong Wang, Ruoyu Sun and Tsung-Hui Chang
- Abstract summary: Federated semi-supervised learning (Fed-SSL) is an attractive solution for fully utilizing both labeled and unlabeled data.
We introduce two proper regularization terms that can effectively alleviate the class distribution mismatch problem in Fed-SSL.
We leverage the variance reduction and normalized averaging techniques to develop a novel Fed-SSL algorithm.
- Score: 34.46190258291223
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many existing federated learning (FL) algorithms are designed for supervised
learning tasks, assuming that the local data owned by the clients are well
labeled. However, in many practical situations, it could be difficult and
expensive to acquire complete data labels. Federated semi-supervised learning
(Fed-SSL) is an attractive solution for fully utilizing both labeled and
unlabeled data. Similar to that encountered in federated supervised learning,
class distribution of labeled/unlabeled data could be non-i.i.d. among clients.
Besides, in each client, the class distribution of labeled data may be distinct
from that of unlabeled data. Unfortunately, both can severely jeopardize the FL
performance. To address such challenging issues, we introduce two proper
regularization terms that can effectively alleviate the class distribution
mismatch problem in Fed-SSL. In addition, to overcome the non-i.i.d. data, we
leverage the variance reduction and normalized averaging techniques to develop
a novel Fed-SSL algorithm. Theoretically, we prove that the proposed method has
a convergence rate of $\mathcal{O}(1/\sqrt{T})$, where $T$ is the number of
communication rounds, even when the data distribution are non-i.i.d. among
clients. To the best of our knowledge, it is the first formal convergence
result for Fed-SSL problems. Numerical experiments based on MNIST data and
CIFAR-10 data show that the proposed method can greatly improve the
classification accuracy compared to baselines.
Related papers
- (FL)$^2$: Overcoming Few Labels in Federated Semi-Supervised Learning [4.803231218533992]
Federated Learning (FL) is a distributed machine learning framework that trains accurate global models while preserving clients' privacy-sensitive data.
Most FL approaches assume that clients possess labeled data, which is often not the case in practice.
We propose $(FL)2$, a robust training method for unlabeled clients using sharpness-aware consistency regularization.
arXiv Detail & Related papers (2024-10-30T17:15:02Z) - Continuous Contrastive Learning for Long-Tailed Semi-Supervised Recognition [50.61991746981703]
Current state-of-the-art LTSSL approaches rely on high-quality pseudo-labels for large-scale unlabeled data.
This paper introduces a novel probabilistic framework that unifies various recent proposals in long-tail learning.
We introduce a continuous contrastive learning method, CCL, extending our framework to unlabeled data using reliable and smoothed pseudo-labels.
arXiv Detail & Related papers (2024-10-08T15:06:10Z) - Three Heads Are Better Than One: Complementary Experts for Long-Tailed Semi-supervised Learning [74.44500692632778]
We propose a novel method named ComPlementary Experts (CPE) to model various class distributions.
CPE achieves state-of-the-art performances on CIFAR-10-LT, CIFAR-100-LT, and STL-10-LT dataset benchmarks.
arXiv Detail & Related papers (2023-12-25T11:54:07Z) - FlatMatch: Bridging Labeled Data and Unlabeled Data with Cross-Sharpness
for Semi-Supervised Learning [73.13448439554497]
Semi-Supervised Learning (SSL) has been an effective way to leverage abundant unlabeled data with extremely scarce labeled data.
Most SSL methods are commonly based on instance-wise consistency between different data transformations.
We propose FlatMatch which minimizes a cross-sharpness measure to ensure consistent learning performance between the two datasets.
arXiv Detail & Related papers (2023-10-25T06:57:59Z) - JointMatch: A Unified Approach for Diverse and Collaborative
Pseudo-Labeling to Semi-Supervised Text Classification [65.268245109828]
Semi-supervised text classification (SSTC) has gained increasing attention due to its ability to leverage unlabeled data.
Existing approaches based on pseudo-labeling suffer from the issues of pseudo-label bias and error accumulation.
We propose JointMatch, a holistic approach for SSTC that addresses these challenges by unifying ideas from recent semi-supervised learning.
arXiv Detail & Related papers (2023-10-23T05:43:35Z) - Combating Data Imbalances in Federated Semi-supervised Learning with
Dual Regulators [40.12377870379059]
Federated semi-supervised learning (FSSL) emerges to train models from a small fraction of labeled data.
We propose a novel FSSL framework with dual regulators, FedDure.
We show that FedDure is superior to the existing methods across a wide range of settings.
arXiv Detail & Related papers (2023-07-11T15:45:03Z) - BiSTF: Bilateral-Branch Self-Training Framework for Semi-Supervised
Large-scale Fine-Grained Recognition [28.06659482245647]
Semi-supervised Fine-Grained Recognition is a challenge task due to data imbalance, high interclass similarity and domain mismatch.
We propose Bilateral-Branch Self-Training Framework (BiSTF) to improve existing semi-balanced and domain-shifted fine-grained data.
We show BiSTF outperforms the existing state-of-the-art SSL on Semi-iNat dataset.
arXiv Detail & Related papers (2021-07-14T15:28:54Z) - OpenMatch: Open-set Consistency Regularization for Semi-supervised
Learning with Outliers [71.08167292329028]
We propose a novel Open-set Semi-Supervised Learning (OSSL) approach called OpenMatch.
OpenMatch unifies FixMatch with novelty detection based on one-vs-all (OVA) classifiers.
It achieves state-of-the-art performance on three datasets, and even outperforms a fully supervised model in detecting outliers unseen in unlabeled data on CIFAR10.
arXiv Detail & Related papers (2021-05-28T23:57:15Z) - Matching Distributions via Optimal Transport for Semi-Supervised
Learning [31.533832244923843]
Semi-Supervised Learning (SSL) approaches have been an influential framework for the usage of unlabeled data.
We propose a new approach that adopts an Optimal Transport (OT) technique serving as a metric of similarity between discrete empirical probability measures.
We have evaluated our proposed method with state-of-the-art SSL algorithms on standard datasets to demonstrate the superiority and effectiveness of our SSL algorithm.
arXiv Detail & Related papers (2020-12-04T11:15:14Z) - Federated Semi-Supervised Learning with Inter-Client Consistency &
Disjoint Learning [78.88007892742438]
We study two essential scenarios of Federated Semi-Supervised Learning (FSSL) based on the location of the labeled data.
We propose a novel method to tackle the problems, which we refer to as Federated Matching (FedMatch)
arXiv Detail & Related papers (2020-06-22T09:43:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.