Text Smoothing: Enhance Various Data Augmentation Methods on Text
Classification Tasks
- URL: http://arxiv.org/abs/2202.13840v1
- Date: Mon, 28 Feb 2022 14:54:08 GMT
- Title: Text Smoothing: Enhance Various Data Augmentation Methods on Text
Classification Tasks
- Authors: Xing Wu, Chaochen Gao, Meng Lin, Liangjun Zang, Zhongyuan Wang,
Songlin Hu
- Abstract summary: Smoothed representation is the probability of candidate tokens obtained from a pre-trained masked language model.
We propose an efficient data augmentation method, termed text smoothing, by converting a sentence from its one-hot representation to a controllable smoothed representation.
- Score: 47.5423959822716
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Before entering the neural network, a token is generally converted to the
corresponding one-hot representation, which is a discrete distribution of the
vocabulary. Smoothed representation is the probability of candidate tokens
obtained from a pre-trained masked language model, which can be seen as a more
informative substitution to the one-hot representation. We propose an efficient
data augmentation method, termed text smoothing, by converting a sentence from
its one-hot representation to a controllable smoothed representation. We
evaluate text smoothing on different benchmarks in a low-resource regime.
Experimental results show that text smoothing outperforms various mainstream
data augmentation methods by a substantial margin. Moreover, text smoothing can
be combined with those data augmentation methods to achieve better performance.
Related papers
- TexIm FAST: Text-to-Image Representation for Semantic Similarity Evaluation using Transformers [2.7651063843287718]
TexIm FAST is a novel methodology for generating fixed-length representations through a self-supervised Variational Auto-Encoder (VAE) for semantic evaluation applying transformers (TexIm FAST)
The pictorial representations allow oblivious inference while retaining the linguistic intricacies, and are potent in cross-modal applications.
The efficacy of TexIm FAST has been extensively analyzed for the task of Semantic Textual Similarity (STS) upon the MSRPC, CNN/ Daily Mail, and XSum data-sets.
arXiv Detail & Related papers (2024-06-06T18:28:50Z) - Efficiently Leveraging Linguistic Priors for Scene Text Spotting [63.22351047545888]
This paper proposes a method that leverages linguistic knowledge from a large text corpus to replace the traditional one-hot encoding used in auto-regressive scene text spotting and recognition models.
We generate text distributions that align well with scene text datasets, removing the need for in-domain fine-tuning.
Experimental results show that our method not only improves recognition accuracy but also enables more accurate localization of words.
arXiv Detail & Related papers (2024-02-27T01:57:09Z) - LRANet: Towards Accurate and Efficient Scene Text Detection with
Low-Rank Approximation Network [63.554061288184165]
We propose a novel parameterized text shape method based on low-rank approximation.
By exploring the shape correlation among different text contours, our method achieves consistency, compactness, simplicity, and robustness in shape representation.
We implement an accurate and efficient arbitrary-shaped text detector named LRANet.
arXiv Detail & Related papers (2023-06-27T02:03:46Z) - Text Revision by On-the-Fly Representation Optimization [76.11035270753757]
Current state-of-the-art methods formulate these tasks as sequence-to-sequence learning problems.
We present an iterative in-place editing approach for text revision, which requires no parallel data.
It achieves competitive and even better performance than state-of-the-art supervised methods on text simplification.
arXiv Detail & Related papers (2022-04-15T07:38:08Z) - HETFORMER: Heterogeneous Transformer with Sparse Attention for Long-Text
Extractive Summarization [57.798070356553936]
HETFORMER is a Transformer-based pre-trained model with multi-granularity sparse attentions for extractive summarization.
Experiments on both single- and multi-document summarization tasks show that HETFORMER achieves state-of-the-art performance in Rouge F1.
arXiv Detail & Related papers (2021-10-12T22:42:31Z) - Enhanced Modality Transition for Image Captioning [51.72997126838352]
We build a Modality Transition Module (MTM) to transfer visual features into semantic representations before forwarding them to the language model.
During the training phase, the modality transition network is optimised by the proposed modality loss.
Experiments have been conducted on the MS-COCO dataset demonstrating the effectiveness of the proposed framework.
arXiv Detail & Related papers (2021-02-23T07:20:12Z) - An Intelligent CNN-VAE Text Representation Technology Based on Text
Semantics for Comprehensive Big Data [15.680918844684454]
A text feature representation model based on convolutional neural network (CNN) and variational autoencoder (VAE) is proposed.
The proposed model outperforms in k-nearest neighbor (KNN), random forest (RF) and support vector machine (SVM) classification algorithms.
arXiv Detail & Related papers (2020-08-28T07:39:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.