Language Models not just for Pre-training: Fast Online Neural Noisy
Channel Modeling
- URL: http://arxiv.org/abs/2011.07164v1
- Date: Fri, 13 Nov 2020 23:22:28 GMT
- Title: Language Models not just for Pre-training: Fast Online Neural Noisy
Channel Modeling
- Authors: Shruti Bhosale, Kyra Yee, Sergey Edunov, Michael Auli
- Abstract summary: We introduce efficient approximations to make inference with the noisy channel approach as fast as strong ensembles.
We also show that the noisy channel approach can outperform strong pre-training results by achieving a new state of the art on WMT Romanian-English translation.
- Score: 35.43382144290393
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-training models on vast quantities of unlabeled data has emerged as an
effective approach to improving accuracy on many NLP tasks. On the other hand,
traditional machine translation has a long history of leveraging unlabeled data
through noisy channel modeling. The same idea has recently been shown to
achieve strong improvements for neural machine translation. Unfortunately,
na\"{i}ve noisy channel modeling with modern sequence to sequence models is up
to an order of magnitude slower than alternatives. We address this issue by
introducing efficient approximations to make inference with the noisy channel
approach as fast as strong ensembles while increasing accuracy. We also show
that the noisy channel approach can outperform strong pre-training results by
achieving a new state of the art on WMT Romanian-English translation.
Related papers
- Make Some Noise: Unlocking Language Model Parallel Inference Capability through Noisy Training [54.581599828392854]
We propose the Make Some Noise (MSN) training framework as a replacement for the supervised fine-tuning stage of the large language model.
The training method simply introduces some noise at the input for the model to learn the denoising task.
Experiments in both the general and code domains have shown that MSN can improve inference speed by 2.3-2.7x times without compromising model performance.
arXiv Detail & Related papers (2024-06-25T09:25:39Z) - TransNormerLLM: A Faster and Better Large Language Model with Improved
TransNormer [34.790081960470964]
We present TransNormerLLM, the first linear attention-based Large Language Model (LLM)
We make advanced modifications that include positional embedding, linear attention acceleration, gating mechanisms, tensor normalization, and inference acceleration and stabilization.
We validate our model design through a series of ablations and train models with sizes of 385M, 1B, and 7B on our self-collected corpus.
arXiv Detail & Related papers (2023-07-27T16:45:33Z) - A Cheaper and Better Diffusion Language Model with Soft-Masked Noise [62.719656543880596]
Masked-Diffuse LM is a novel diffusion model for language modeling, inspired by linguistic features in languages.
Specifically, we design a linguistic-informed forward process which adds corruptions to the text through strategically soft-masking to better noise the textual data.
We demonstrate that our Masked-Diffuse LM can achieve better generation quality than the state-of-the-art diffusion models with better efficiency.
arXiv Detail & Related papers (2023-04-10T17:58:42Z) - DiffusionBERT: Improving Generative Masked Language Models with
Diffusion Models [81.84866217721361]
DiffusionBERT is a new generative masked language model based on discrete diffusion models.
We propose a new noise schedule for the forward diffusion process that controls the degree of noise added at each step.
Experiments on unconditional text generation demonstrate that DiffusionBERT achieves significant improvement over existing diffusion models for text.
arXiv Detail & Related papers (2022-11-28T03:25:49Z) - Improving Pre-trained Language Model Fine-tuning with Noise Stability
Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR)
Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model.
We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z) - Teaching BERT to Wait: Balancing Accuracy and Latency for Streaming
Disfluency Detection [3.884530687475798]
Streaming BERT-based sequence tagging model is capable of detecting disfluencies in real-time.
Model attains state-of-the-art latency and stability scores when compared with recent work on incremental disfluency detection.
arXiv Detail & Related papers (2022-05-02T02:13:24Z) - Noisy Channel Language Model Prompting for Few-Shot Text Classification [87.23056864536613]
We introduce a noisy channel approach for language model prompting in few-shot text classification.
Instead of computing the likelihood of the label given the input, channel models compute the conditional probability of the input given the label.
We use channel models for recently proposed few-shot learning methods with no or very limited updates to the language model parameters.
arXiv Detail & Related papers (2021-08-09T15:06:26Z) - BERT Loses Patience: Fast and Robust Inference with Early Exit [91.26199404912019]
We propose Patience-based Early Exit as a plug-and-play technique to improve the efficiency and robustness of a pretrained language model.
Our approach improves inference efficiency as it allows the model to make a prediction with fewer layers.
arXiv Detail & Related papers (2020-06-07T13:38:32Z) - Single Channel Speech Enhancement Using Temporal Convolutional Recurrent
Neural Networks [23.88788382262305]
temporal convolutional recurrent network (TCRN) is an end-to-end model that directly map noisy waveform to clean waveform.
We show that our model is able to improve the performance of model, compared with existing convolutional recurrent networks.
arXiv Detail & Related papers (2020-02-02T04:26:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.