Related papers: Language Models not just for Pre-training: Fast Online Neural Noisy Channel Modeling

Language Models not just for Pre-training: Fast Online Neural Noisy Channel Modeling

URL: http://arxiv.org/abs/2011.07164v1
Date: Fri, 13 Nov 2020 23:22:28 GMT
Title: Language Models not just for Pre-training: Fast Online Neural Noisy Channel Modeling
Authors: Shruti Bhosale, Kyra Yee, Sergey Edunov, Michael Auli
Abstract summary: We introduce efficient approximations to make inference with the noisy channel approach as fast as strong ensembles. We also show that the noisy channel approach can outperform strong pre-training results by achieving a new state of the art on WMT Romanian-English translation.
Score: 35.43382144290393
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Pre-training models on vast quantities of unlabeled data has emerged as an effective approach to improving accuracy on many NLP tasks. On the other hand, traditional machine translation has a long history of leveraging unlabeled data through noisy channel modeling. The same idea has recently been shown to achieve strong improvements for neural machine translation. Unfortunately, na\"{i}ve noisy channel modeling with modern sequence to sequence models is up to an order of magnitude slower than alternatives. We address this issue by introducing efficient approximations to make inference with the noisy channel approach as fast as strong ensembles while increasing accuracy. We also show that the noisy channel approach can outperform strong pre-training results by achieving a new state of the art on WMT Romanian-English translation.

Related papers

Accelerating Diffusion Language Model Inference via Efficient KV Caching and Guided Diffusion [16.99620863197586]
Diffusion language models offer parallel token generation and inherent bidirectionality.<n>State-of-the-art diffusion models (e.g., Dream 7B, LLaDA 8B) suffer from slow inference.<n>We introduce Guided Diffusion, a training-free method that uses a lightweight pretrained autoregressive model to supervise token unmasking.<n>For the first time, diffusion language models achieve a comparable and even faster latency as the widely adopted autoregressive models.
arXiv Detail & Related papers (2025-05-27T17:39:39Z)
One-Step Diffusion Model for Image Motion-Deblurring [85.76149042561507]
We propose a one-step diffusion model for deblurring (OSDD), a novel framework that reduces the denoising process to a single step. To tackle fidelity loss in diffusion models, we introduce an enhanced variational autoencoder (eVAE), which improves structural restoration. Our method achieves strong performance on both full and no-reference metrics.
arXiv Detail & Related papers (2025-03-09T09:39:57Z)
Make Some Noise: Unlocking Language Model Parallel Inference Capability through Noisy Training [54.581599828392854]
We propose the Make Some Noise (MSN) training framework as a replacement for the supervised fine-tuning stage of the large language model. The training method simply introduces some noise at the input for the model to learn the denoising task. Experiments in both the general and code domains have shown that MSN can improve inference speed by 2.3-2.7x times without compromising model performance.
arXiv Detail & Related papers (2024-06-25T09:25:39Z)
TransNormerLLM: A Faster and Better Large Language Model with Improved TransNormer [34.790081960470964]
We present TransNormerLLM, the first linear attention-based Large Language Model (LLM) We make advanced modifications that include positional embedding, linear attention acceleration, gating mechanisms, tensor normalization, and inference acceleration and stabilization. We validate our model design through a series of ablations and train models with sizes of 385M, 1B, and 7B on our self-collected corpus.
arXiv Detail & Related papers (2023-07-27T16:45:33Z)
A Cheaper and Better Diffusion Language Model with Soft-Masked Noise [62.719656543880596]
Masked-Diffuse LM is a novel diffusion model for language modeling, inspired by linguistic features in languages. Specifically, we design a linguistic-informed forward process which adds corruptions to the text through strategically soft-masking to better noise the textual data. We demonstrate that our Masked-Diffuse LM can achieve better generation quality than the state-of-the-art diffusion models with better efficiency.
arXiv Detail & Related papers (2023-04-10T17:58:42Z)
DiffusionBERT: Improving Generative Masked Language Models with Diffusion Models [81.84866217721361]
DiffusionBERT is a new generative masked language model based on discrete diffusion models. We propose a new noise schedule for the forward diffusion process that controls the degree of noise added at each step. Experiments on unconditional text generation demonstrate that DiffusionBERT achieves significant improvement over existing diffusion models for text.
arXiv Detail & Related papers (2022-11-28T03:25:49Z)
Improving Pre-trained Language Model Fine-tuning with Noise Stability Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR) Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model. We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z)
Teaching BERT to Wait: Balancing Accuracy and Latency for Streaming Disfluency Detection [3.884530687475798]
Streaming BERT-based sequence tagging model is capable of detecting disfluencies in real-time. Model attains state-of-the-art latency and stability scores when compared with recent work on incremental disfluency detection.
arXiv Detail & Related papers (2022-05-02T02:13:24Z)
Noisy Channel Language Model Prompting for Few-Shot Text Classification [87.23056864536613]
We introduce a noisy channel approach for language model prompting in few-shot text classification. Instead of computing the likelihood of the label given the input, channel models compute the conditional probability of the input given the label. We use channel models for recently proposed few-shot learning methods with no or very limited updates to the language model parameters.
arXiv Detail & Related papers (2021-08-09T15:06:26Z)
BERT Loses Patience: Fast and Robust Inference with Early Exit [91.26199404912019]
We propose Patience-based Early Exit as a plug-and-play technique to improve the efficiency and robustness of a pretrained language model. Our approach improves inference efficiency as it allows the model to make a prediction with fewer layers.
arXiv Detail & Related papers (2020-06-07T13:38:32Z)
Single Channel Speech Enhancement Using Temporal Convolutional Recurrent Neural Networks [23.88788382262305]
temporal convolutional recurrent network (TCRN) is an end-to-end model that directly map noisy waveform to clean waveform. We show that our model is able to improve the performance of model, compared with existing convolutional recurrent networks.
arXiv Detail & Related papers (2020-02-02T04:26:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.