Towards Utilizing Unlabeled Data in Federated Learning: A Survey and
Prospective
- URL: http://arxiv.org/abs/2002.11545v2
- Date: Mon, 11 May 2020 11:44:11 GMT
- Title: Towards Utilizing Unlabeled Data in Federated Learning: A Survey and
Prospective
- Authors: Yilun Jin, Xiguang Wei, Yang Liu, Qiang Yang
- Abstract summary: Federated Learning (FL) proposed in recent years has received significant attention from researchers.
In most applications of FL, such as keyboard prediction, labeling data requires virtually no additional efforts.
We identify the need to exploit unlabeled data in FL, and survey possible research fields that can contribute to the goal.
- Score: 18.40606952418594
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Federated Learning (FL) proposed in recent years has received significant
attention from researchers in that it can bring separate data sources together
and build machine learning models in a collaborative but private manner. Yet,
in most applications of FL, such as keyboard prediction, labeling data requires
virtually no additional efforts, which is not generally the case. In reality,
acquiring large-scale labeled datasets can be extremely costly, which motivates
research works that exploit unlabeled data to help build machine learning
models. However, to the best of our knowledge, few existing works aim to
utilize unlabeled data to enhance federated learning, which leaves a
potentially promising research topic. In this paper, we identify the need to
exploit unlabeled data in FL, and survey possible research fields that can
contribute to the goal.
Related papers
- Non-IID data in Federated Learning: A Systematic Review with Taxonomy, Metrics, Methods, Frameworks and Future Directions [2.9434966603161072]
This systematic review aims to fill a gap by providing a detailed taxonomy for non-IID data, partition protocols, and metrics.
We describe popular solutions to address non-IID data and standardized frameworks employed in Federated Learning with heterogeneous data.
arXiv Detail & Related papers (2024-11-19T09:53:28Z) - Learning with Less: Knowledge Distillation from Large Language Models via Unlabeled Data [54.934578742209716]
In real-world NLP applications, Large Language Models (LLMs) offer promising solutions due to their extensive training on vast datasets.
LLKD is an adaptive sample selection method that incorporates signals from both the teacher and student.
Our comprehensive experiments show that LLKD achieves superior performance across various datasets with higher data efficiency.
arXiv Detail & Related papers (2024-11-12T18:57:59Z) - Data-Centric AI in the Age of Large Language Models [51.20451986068925]
This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs)
We make the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs.
We identify four specific scenarios centered around data, covering data-centric benchmarks and data curation, data attribution, knowledge transfer, and inference contextualization.
arXiv Detail & Related papers (2024-06-20T16:34:07Z) - A Survey on Data Selection for Language Models [148.300726396877]
Data selection methods aim to determine which data points to include in a training dataset.
Deep learning is mostly driven by empirical evidence and experimentation on large-scale data is expensive.
Few organizations have the resources for extensive data selection research.
arXiv Detail & Related papers (2024-02-26T18:54:35Z) - Federated Learning without Full Labels: A Survey [23.49131075675469]
We present a survey of methods that combine federated learning with semi-supervised learning, self-supervised learning, and transfer learning methods.
We also summarize the datasets used to evaluate FL methods without full labels.
arXiv Detail & Related papers (2023-03-25T12:13:31Z) - Understanding the World Through Action [91.3755431537592]
I will argue that a general, principled, and powerful framework for utilizing unlabeled data can be derived from reinforcement learning.
I will discuss how such a procedure is more closely aligned with potential downstream tasks.
arXiv Detail & Related papers (2021-10-24T22:33:52Z) - Federated Learning on Non-IID Data Silos: An Experimental Study [34.28108345251376]
Training data have been increasingly fragmented, forming distributed databases of multiple data silos.
In this paper, we propose comprehensive data partitioning strategies to cover the typical non-IID data cases.
We find that non-IID does bring significant challenges in learning accuracy of FL algorithms, and none of the existing state-of-the-art FL algorithms outperforms others in all cases.
arXiv Detail & Related papers (2021-02-03T14:29:09Z) - Adversarial Knowledge Transfer from Unlabeled Data [62.97253639100014]
We present a novel Adversarial Knowledge Transfer framework for transferring knowledge from internet-scale unlabeled data to improve the performance of a classifier.
An important novel aspect of our method is that the unlabeled source data can be of different classes from those of the labeled target data, and there is no need to define a separate pretext task.
arXiv Detail & Related papers (2020-08-13T08:04:27Z) - DeGAN : Data-Enriching GAN for Retrieving Representative Samples from a
Trained Classifier [58.979104709647295]
We bridge the gap between the abundance of available data and lack of relevant data, for the future learning tasks of a trained network.
We use the available data, that may be an imbalanced subset of the original training dataset, or a related domain dataset, to retrieve representative samples.
We demonstrate that data from a related domain can be leveraged to achieve state-of-the-art performance.
arXiv Detail & Related papers (2019-12-27T02:05:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.