Semantic Label Smoothing for Sequence to Sequence Problems
- URL: http://arxiv.org/abs/2010.07447v1
- Date: Thu, 15 Oct 2020 00:31:15 GMT
- Title: Semantic Label Smoothing for Sequence to Sequence Problems
- Authors: Michal Lukasik, Himanshu Jain, Aditya Krishna Menon, Seungyeon Kim,
Srinadh Bhojanapalli, Felix Yu, Sanjiv Kumar
- Abstract summary: We propose a technique that smooths over emphwell formed relevant sequences that have sufficient n-gram overlap with the target sequence.
Our method shows a consistent and significant improvement over the state-of-the-art techniques on different datasets.
- Score: 54.758974840974425
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Label smoothing has been shown to be an effective regularization strategy in
classification, that prevents overfitting and helps in label de-noising.
However, extending such methods directly to seq2seq settings, such as Machine
Translation, is challenging: the large target output space of such problems
makes it intractable to apply label smoothing over all possible outputs. Most
existing approaches for seq2seq settings either do token level smoothing, or
smooth over sequences generated by randomly substituting tokens in the target
sequence. Unlike these works, in this paper, we propose a technique that
smooths over \emph{well formed} relevant sequences that not only have
sufficient n-gram overlap with the target sequence, but are also
\emph{semantically similar}. Our method shows a consistent and significant
improvement over the state-of-the-art techniques on different datasets.
Related papers
- Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech
Recognition [49.42732949233184]
When labeled data is insufficient, semi-supervised learning with the pseudo-labeling technique can significantly improve the performance of automatic speech recognition.
Taking noisy labels as ground-truth in the loss function results in suboptimal performance.
We propose a novel framework named alternative pseudo-labeling to tackle the issue of noisy pseudo-labels.
arXiv Detail & Related papers (2023-08-12T12:13:52Z) - OTSeq2Set: An Optimal Transport Enhanced Sequence-to-Set Model for
Extreme Multi-label Text Classification [9.990725102725916]
Extreme multi-label text classification (XMTC) is the task of finding the most relevant subset labels from a large-scale label collection.
We propose an autoregressive sequence-to-set model for XMTC tasks named OTSeq2Set.
Our model generates predictions in student-forcing scheme and is trained by a loss function based on bipartite matching.
arXiv Detail & Related papers (2022-10-26T07:25:18Z) - Modeling sequential annotations for sequence labeling with crowds [8.239028141030621]
Crowd sequential annotations can be an efficient and cost-effective way to build large datasets for sequence labeling.
We propose Modeling sequential annotation for sequence labeling with crowds (SA-SLC)
A valid label sequence inference (VLSE) method is proposed to derive the valid ground-truth label sequences from crowd sequential annotations.
arXiv Detail & Related papers (2022-09-20T02:51:23Z) - Seq-UPS: Sequential Uncertainty-aware Pseudo-label Selection for
Semi-Supervised Text Recognition [21.583569162994277]
One of the most popular SSL approaches is pseudo-labeling (PL)
PL methods are severely degraded by noise and are prone to over-fitting to noisy labels.
We propose a pseudo-label generation and an uncertainty-based data selection framework for text recognition.
arXiv Detail & Related papers (2022-08-31T02:21:02Z) - Efficient and Flexible Sublabel-Accurate Energy Minimization [62.50191141358778]
We address the problem of minimizing a class of energy functions consisting of data and smoothness terms.
Existing continuous optimization methods can find sublabel-accurate solutions, but they are not efficient for large label spaces.
We propose an efficient sublabel-accurate method that utilizes the best properties of both continuous and discrete models.
arXiv Detail & Related papers (2022-06-20T06:58:55Z) - Learning with Noisy Labels via Sparse Regularization [76.31104997491695]
Learning with noisy labels is an important task for training accurate deep neural networks.
Some commonly-used loss functions, such as Cross Entropy (CE), suffer from severe overfitting to noisy labels.
We introduce the sparse regularization strategy to approximate the one-hot constraint.
arXiv Detail & Related papers (2021-07-31T09:40:23Z) - Don't Take It Literally: An Edit-Invariant Sequence Loss for Text
Generation [109.46348908829697]
We propose a novel Edit-Invariant Sequence Loss (EISL), which computes the matching loss of a target n-gram with all n-grams in the generated sequence.
We conduct experiments on three tasks: machine translation with noisy target sequences, unsupervised text style transfer, and non-autoregressive machine translation.
arXiv Detail & Related papers (2021-06-29T03:59:21Z) - Accelerating BERT Inference for Sequence Labeling via Early-Exit [65.7292767360083]
We extend the recent successful early-exit mechanism to accelerate the inference of PTMs for sequence labeling tasks.
We also propose a token-level early-exit mechanism that allows partial tokens to exit early at different layers.
Our approach can save up to 66%-75% inference cost with minimal performance degradation.
arXiv Detail & Related papers (2021-05-28T14:39:26Z) - SeqMix: Augmenting Active Sequence Labeling via Sequence Mixup [11.606681893887604]
We propose a simple but effective data augmentation method to improve the label efficiency of active sequence labeling.
Our method, SeqMix, simply augments the queried samples by generating extra labeled sequences in each iteration.
In SeqMix, we address this challenge by performing mixup for both sequences and token-level labels of the queried samples.
arXiv Detail & Related papers (2020-10-05T20:27:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.