Text Simplification by Tagging
- URL: http://arxiv.org/abs/2103.05070v1
- Date: Mon, 8 Mar 2021 20:57:55 GMT
- Title: Text Simplification by Tagging
- Authors: Kostiantyn Omelianchuk, Vipul Raheja, Oleksandr Skurzhanskyi
- Abstract summary: We present TST, a simple and efficient Text Simplification system based on sequence Tagging.
Our system makes simplistic data augmentations and tweaks in training and inference on a pre-existing system.
It achieves faster inference speeds by over 11 times than the current state-of-the-art text simplification system.
- Score: 21.952293614293392
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Edit-based approaches have recently shown promising results on multiple
monolingual sequence transduction tasks. In contrast to conventional
sequence-to-sequence (Seq2Seq) models, which learn to generate text from
scratch as they are trained on parallel corpora, these methods have proven to
be much more effective since they are able to learn to make fast and accurate
transformations while leveraging powerful pre-trained language models. Inspired
by these ideas, we present TST, a simple and efficient Text Simplification
system based on sequence Tagging, leveraging pre-trained Transformer-based
encoders. Our system makes simplistic data augmentations and tweaks in training
and inference on a pre-existing system, which makes it less reliant on large
amounts of parallel training data, provides more control over the outputs and
enables faster inference speeds. Our best model achieves near state-of-the-art
performance on benchmark test datasets for the task. Since it is fully
non-autoregressive, it achieves faster inference speeds by over 11 times than
the current state-of-the-art text simplification system.
Related papers
- Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration [54.897493351694195]
We propose a novel parallel decoding approach, namely textithidden transfer, which decodes multiple successive tokens simultaneously in a single forward pass.
In terms of acceleration metrics, we outperform all the single-model acceleration techniques, including Medusa and Self-Speculative decoding.
arXiv Detail & Related papers (2024-04-18T09:17:06Z) - LRANet: Towards Accurate and Efficient Scene Text Detection with
Low-Rank Approximation Network [63.554061288184165]
We propose a novel parameterized text shape method based on low-rank approximation.
By exploring the shape correlation among different text contours, our method achieves consistency, compactness, simplicity, and robustness in shape representation.
We implement an accurate and efficient arbitrary-shaped text detector named LRANet.
arXiv Detail & Related papers (2023-06-27T02:03:46Z) - KEST: Kernel Distance Based Efficient Self-Training for Improving
Controllable Text Generation [24.47531522553703]
We propose KEST, a novel and efficient self-training framework to handle these problems.
KEST utilizes a kernel-based loss, rather than standard cross entropy, to learn from the soft pseudo text produced by a shared non-autoregressive generator.
Experiments on three controllable generation tasks demonstrate that KEST significantly improves control accuracy while maintaining comparable text fluency and generation diversity against several strong baselines.
arXiv Detail & Related papers (2023-06-17T19:40:57Z) - HLATR: Enhance Multi-stage Text Retrieval with Hybrid List Aware
Transformer Reranking [16.592276887533714]
Hybrid List Aware Transformer Reranking (HLATR) is a subsequent reranking module to incorporate both retrieval and reranking stage features.
HLATR is lightweight and can be easily parallelized with existing text retrieval systems.
Empirical experiments on two large-scale text retrieval datasets show that HLATR can efficiently improve the ranking performance of existing multi-stage text retrieval methods.
arXiv Detail & Related papers (2022-05-21T11:38:33Z) - Text Revision by On-the-Fly Representation Optimization [76.11035270753757]
Current state-of-the-art methods formulate these tasks as sequence-to-sequence learning problems.
We present an iterative in-place editing approach for text revision, which requires no parallel data.
It achieves competitive and even better performance than state-of-the-art supervised methods on text simplification.
arXiv Detail & Related papers (2022-04-15T07:38:08Z) - On Adversarial Robustness of Synthetic Code Generation [1.2559148369195197]
This paper showcases the existence of significant dataset bias through different classes of adversarial examples.
We propose several dataset augmentation techniques to reduce bias and showcase their efficacy.
arXiv Detail & Related papers (2021-06-22T09:37:48Z) - SDA: Improving Text Generation with Self Data Augmentation [88.24594090105899]
We propose to improve the standard maximum likelihood estimation (MLE) paradigm by incorporating a self-imitation-learning phase for automatic data augmentation.
Unlike most existing sentence-level augmentation strategies, our method is more general and could be easily adapted to any MLE-based training procedure.
arXiv Detail & Related papers (2021-01-02T01:15:57Z) - Improving Text Generation with Student-Forcing Optimal Transport [122.11881937642401]
We propose using optimal transport (OT) to match the sequences generated in training and testing modes.
An extension is also proposed to improve the OT learning, based on the structural and contextual information of the text sequences.
The effectiveness of the proposed method is validated on machine translation, text summarization, and text generation tasks.
arXiv Detail & Related papers (2020-10-12T19:42:25Z) - POINTER: Constrained Progressive Text Generation via Insertion-based
Generative Pre-training [93.79766670391618]
We present POINTER, a novel insertion-based approach for hard-constrained text generation.
The proposed method operates by progressively inserting new tokens between existing tokens in a parallel manner.
The resulting coarse-to-fine hierarchy makes the generation process intuitive and interpretable.
arXiv Detail & Related papers (2020-05-01T18:11:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.