Industry Scale Semi-Supervised Learning for Natural Language
Understanding
- URL: http://arxiv.org/abs/2103.15871v1
- Date: Mon, 29 Mar 2021 18:24:02 GMT
- Title: Industry Scale Semi-Supervised Learning for Natural Language
Understanding
- Authors: Luoxin Chen, Francisco Garcia, Varun Kumar, He Xie, Jianhua Lu
- Abstract summary: This paper presents a production Semi-Supervised Learning (SSL) pipeline based on the student-teacher framework.
We investigate two questions related to the use of unlabeled data in production SSL context.
We compare four widely used SSL techniques, Pseudo-Label (PL), Knowledge Distillation (KD), Virtual Adversarial Training (VAT) and Cross-View Training (CVT)
- Score: 14.844450283047234
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents a production Semi-Supervised Learning (SSL) pipeline
based on the student-teacher framework, which leverages millions of unlabeled
examples to improve Natural Language Understanding (NLU) tasks. We investigate
two questions related to the use of unlabeled data in production SSL context:
1) how to select samples from a huge unlabeled data pool that are beneficial
for SSL training, and 2) how do the selected data affect the performance of
different state-of-the-art SSL techniques. We compare four widely used SSL
techniques, Pseudo-Label (PL), Knowledge Distillation (KD), Virtual Adversarial
Training (VAT) and Cross-View Training (CVT) in conjunction with two data
selection methods including committee-based selection and submodular
optimization based selection. We further examine the benefits and drawbacks of
these techniques when applied to intent classification (IC) and named entity
recognition (NER) tasks, and provide guidelines specifying when each of these
methods might be beneficial to improve large scale NLU systems.
Related papers
- Exploration and Exploitation of Unlabeled Data for Open-Set
Semi-Supervised Learning [130.56124475528475]
We address a complex but practical scenario in semi-supervised learning (SSL) named open-set SSL, where unlabeled data contain both in-distribution (ID) and out-of-distribution (OOD) samples.
Our proposed method achieves state-of-the-art in several challenging benchmarks, and improves upon existing SSL methods even when ID samples are totally absent in unlabeled data.
arXiv Detail & Related papers (2023-06-30T14:25:35Z) - A Dual-branch Self-supervised Representation Learning Framework for
Tumour Segmentation in Whole Slide Images [12.961686610789416]
Self-supervised learning (SSL) has emerged as an alternative solution to reduce the annotation overheads in whole slide images.
These SSL approaches are not designed for handling multi-resolution WSIs, which limits their performance in learning discriminative image features.
We propose a Dual-branch SSL Framework for WSI tumour segmentation (DSF-WSI) that can effectively learn image features from multi-resolution WSIs.
arXiv Detail & Related papers (2023-03-20T10:57:28Z) - Understanding and Improving the Role of Projection Head in
Self-Supervised Learning [77.59320917894043]
Self-supervised learning (SSL) aims to produce useful feature representations without access to human-labeled data annotations.
Current contrastive learning approaches append a parametrized projection head to the end of some backbone network to optimize the InfoNCE objective.
This raises a fundamental question: Why is a learnable projection head required if we are to discard it after training?
arXiv Detail & Related papers (2022-12-22T05:42:54Z) - LASP: Text-to-Text Optimization for Language-Aware Soft Prompting of
Vision & Language Models [67.19124099815645]
We propose a novel Language-Aware Soft Prompting (LASP) learning method to alleviate base class overfitting.
LASP is inherently amenable to including, during training, virtual classes, i.e. class names for which no visual samples are available.
LASP matches and surpasses, for the first time, the accuracy on novel classes obtained by hand-crafted prompts and CLIP for 8 out of 11 test datasets.
arXiv Detail & Related papers (2022-10-03T17:56:35Z) - Collaborative Intelligence Orchestration: Inconsistency-Based Fusion of
Semi-Supervised Learning and Active Learning [60.26659373318915]
Active learning (AL) and semi-supervised learning (SSL) are two effective, but often isolated, means to alleviate the data-hungry problem.
We propose an innovative Inconsistency-based virtual aDvErial algorithm to further investigate SSL-AL's potential superiority.
Two real-world case studies visualize the practical industrial value of applying and deploying the proposed data sampling algorithm.
arXiv Detail & Related papers (2022-06-07T13:28:43Z) - Trash to Treasure: Harvesting OOD Data with Cross-Modal Matching for
Open-Set Semi-Supervised Learning [101.28281124670647]
Open-set semi-supervised learning (open-set SSL) investigates a challenging but practical scenario where out-of-distribution (OOD) samples are contained in the unlabeled data.
We propose a novel training mechanism that could effectively exploit the presence of OOD data for enhanced feature learning.
Our approach substantially lifts the performance on open-set SSL and outperforms the state-of-the-art by a large margin.
arXiv Detail & Related papers (2021-08-12T09:14:44Z) - Self-Supervised Learning of Graph Neural Networks: A Unified Review [50.71341657322391]
Self-supervised learning is emerging as a new paradigm for making use of large amounts of unlabeled samples.
We provide a unified review of different ways of training graph neural networks (GNNs) using SSL.
Our treatment of SSL methods for GNNs sheds light on the similarities and differences of various methods, setting the stage for developing new methods and algorithms.
arXiv Detail & Related papers (2021-02-22T03:43:45Z) - Matching Distributions via Optimal Transport for Semi-Supervised
Learning [31.533832244923843]
Semi-Supervised Learning (SSL) approaches have been an influential framework for the usage of unlabeled data.
We propose a new approach that adopts an Optimal Transport (OT) technique serving as a metric of similarity between discrete empirical probability measures.
We have evaluated our proposed method with state-of-the-art SSL algorithms on standard datasets to demonstrate the superiority and effectiveness of our SSL algorithm.
arXiv Detail & Related papers (2020-12-04T11:15:14Z) - Knowledge Distillation and Data Selection for Semi-Supervised Learning
in CTC Acoustic Models [9.496916045581736]
Semi-supervised learning (SSL) is an active area of research which aims to utilize unlabelled data in order to improve the accuracy of speech recognition systems.
Our aim is to establish the importance of good criteria in selecting samples from a large pool of unlabelled data.
We perform empirical investigations of different data selection methods to answer this question and quantify the effect of different sampling strategies.
arXiv Detail & Related papers (2020-08-10T07:00:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.