FastLR: Non-Autoregressive Lipreading Model with Integrate-and-Fire
        - URL: http://arxiv.org/abs/2008.02516v4
- Date: Mon, 15 Mar 2021 07:23:19 GMT
- Title: FastLR: Non-Autoregressive Lipreading Model with Integrate-and-Fire
- Authors: Jinglin Liu, Yi Ren, Zhou Zhao, Chen Zhang, Baoxing Huai, Nicholas
  Jing Yuan
- Abstract summary: We propose FastLR, a non-autoregressive (NAR) lipreading model which generates all target tokens simultaneously.
FastLR achieves the speedup up to 10.97$times$ compared with state-of-the-art lipreading model.
- Score: 74.04394069262108
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   Lipreading is an impressive technique and there has been a definite
improvement of accuracy in recent years. However, existing methods for
lipreading mainly build on autoregressive (AR) model, which generate target
tokens one by one and suffer from high inference latency. To breakthrough this
constraint, we propose FastLR, a non-autoregressive (NAR) lipreading model
which generates all target tokens simultaneously. NAR lipreading is a
challenging task that has many difficulties: 1) the discrepancy of sequence
lengths between source and target makes it difficult to estimate the length of
the output sequence; 2) the conditionally independent behavior of NAR
generation lacks the correlation across time which leads to a poor
approximation of target distribution; 3) the feature representation ability of
encoder can be weak due to lack of effective alignment mechanism; and 4) the
removal of AR language model exacerbates the inherent ambiguity problem of
lipreading. Thus, in this paper, we introduce three methods to reduce the gap
between FastLR and AR model: 1) to address challenges 1 and 2, we leverage
integrate-and-fire (I\&F) module to model the correspondence between source
video frames and output text sequence. 2) To tackle challenge 3, we add an
auxiliary connectionist temporal classification (CTC) decoder to the top of the
encoder and optimize it with extra CTC loss. We also add an auxiliary
autoregressive decoder to help the feature extraction of encoder. 3) To
overcome challenge 4, we propose a novel Noisy Parallel Decoding (NPD) for I\&F
and bring Byte-Pair Encoding (BPE) into lipreading. Our experiments exhibit
that FastLR achieves the speedup up to 10.97$\times$ comparing with
state-of-the-art lipreading model with slight WER absolute increase of 1.5\%
and 5.5\% on GRID and LRS2 lipreading datasets respectively, which demonstrates
the effectiveness of our proposed method.
 
      
        Related papers
        - R-Stitch: Dynamic Trajectory Stitching for Efficient Reasoning [60.37610817226533]
 Chain-of-thought (CoT) reasoning encourages step-by-step intermediate reasoning during inference.<n>CoT introduces substantial computational overhead due to its reliance on autoregressive decoding over long token sequences.<n>We present R-Stitch, a token-level, confidence-based hybrid decoding framework that accelerates CoT inference.
 arXiv  Detail & Related papers  (2025-07-23T08:14:36Z)
- DiffuCoder: Understanding and Improving Masked Diffusion Models for Code   Generation [68.19756761027351]
 Diffusion large language models (dLLMs) are compelling alternatives to autoregressive (AR) models.<n>We investigate their denoising processes and reinforcement learning methods.<n>Our work provides deeper insight into the machinery of dLLM generation and offers an effective, diffusion-native RL training framework.
 arXiv  Detail & Related papers  (2025-06-25T17:35:47Z)
- Accelerating Diffusion LLMs via Adaptive Parallel Decoding [50.9948753314669]
 We introduce adaptive parallel decoding (APD), a novel method that dynamically adjusts the number of tokens sampled in parallel.<n>APD provides markedly higher throughput with minimal quality degradations on downstream benchmarks.
 arXiv  Detail & Related papers  (2025-05-31T06:10:10Z)
- Learn to Reason Efficiently with Adaptive Length-based Reward Shaping [23.626013831589212]
 Large Reasoning Models (LRMs) have shown remarkable capabilities in solving complex problems through reinforcement learning (RL)<n>We present a unified framework that formulates various efficient reasoning methods through the lens of length-based reward shaping.<n>Experiments on DeepSeek-R1-Distill-Qwen-1.5B, DeepSeek-R1-Distill-Qwen-7B, and DeepSeek-R1-Distill-Qwen-32B show that our approach significantly enhances both reasoning performance and response length efficiency.
 arXiv  Detail & Related papers  (2025-05-21T15:03:26Z)
- Accelerating LLM Inference with Lossless Speculative Decoding Algorithms   for Heterogeneous Vocabularies [7.14946066475415]
 Speculative decoding (SD) methods offer substantial efficiency gains by generating multiple tokens using a single target forward pass.<n>Existing SD approaches require the drafter and target models to share the same vocabulary, thus limiting the pool of possible drafters.<n>We present three new SD methods that remove this shared-vocabulary constraint.<n>Our algorithms demonstrate significant speedups of up to 2.8x over standard autoregressive decoding.
 arXiv  Detail & Related papers  (2025-01-31T19:13:58Z)
- Falcon: Faster and Parallel Inference of Large Language Models through   Enhanced Semi-Autoregressive Drafting and Custom-Designed Decoding Tree [7.438117410146904]
 Falcon is an innovative speculative decoding framework fashioned to augment both the drafter's parallelism and output quality.
Falcon incorporates the Coupled Sequential Glancing Distillation technique, which fortifies inter-token dependencies within the same block, leading to increased speculation accuracy.
 arXiv  Detail & Related papers  (2024-12-17T08:02:08Z)
- LANTERN: Accelerating Visual Autoregressive Models with Relaxed   Speculative Decoding [30.630803933771865]
 Experimental results demonstrate the efficacy of our method in providing a substantial speed-up over speculative decoding.
 LANTERN increases speed-ups by $mathbf1.75times$ and $mathbf1.76times$, as compared to greedy decoding and random sampling.
 arXiv  Detail & Related papers  (2024-10-04T12:21:03Z)
- Towards Effective and Efficient Non-autoregressive Decoding Using   Block-based Attention Mask [74.64216073678617]
 AMD performs parallel NAR inference within contiguous blocks of output labels concealed using attention masks.
A beam search algorithm is designed to leverage a dynamic fusion of CTC, AR Decoder, and AMD probabilities.
Experiments on the LibriSpeech-100hr corpus suggest the tripartite Decoder incorporating the AMD module produces a maximum decoding speed-up ratio of 1.73x.
 arXiv  Detail & Related papers  (2024-06-14T13:42:38Z)
- Chimera: A Lossless Decoding Method for Accelerating Large Language   Models Inference by Fusing all Tokens [15.566726645722657]
 We propose a novel framework specifically designed for speculative sampling.
Within this framework, we introduce a lightweight draft model that effectively utilizes previously generated tokens to predict subsequent words.
We demonstrate impressive results, achieving an average latency speedup ratio of 2.7x compared to the vanilla auto-regressive decoding approach.
 arXiv  Detail & Related papers  (2024-02-24T08:10:39Z)
- Complexity Matters: Rethinking the Latent Space for Generative Modeling [65.64763873078114]
 In generative modeling, numerous successful approaches leverage a low-dimensional latent space, e.g., Stable Diffusion.
In this study, we aim to shed light on this under-explored topic by rethinking the latent space from the perspective of model complexity.
 arXiv  Detail & Related papers  (2023-07-17T07:12:29Z)
- Improving Dual-Encoder Training through Dynamic Indexes for Negative
  Mining [61.09807522366773]
 We introduce an algorithm that approximates the softmax with provable bounds and that dynamically maintains the tree.
In our study on datasets with over twenty million targets, our approach cuts error by half in relation to oracle brute-force negative mining.
 arXiv  Detail & Related papers  (2023-03-27T15:18:32Z)
- Paraformer: Fast and Accurate Parallel Transformer for
  Non-autoregressive End-to-End Speech Recognition [62.83832841523525]
 We propose a fast and accurate parallel transformer, termed Paraformer.
It accurately predicts the number of output tokens and extract hidden variables.
It can attain comparable performance to the state-of-the-art AR transformer, with more than 10x speedup.
 arXiv  Detail & Related papers  (2022-06-16T17:24:14Z)
- Highly Parallel Autoregressive Entity Linking with Discriminative
  Correction [51.947280241185]
 We propose a very efficient approach that parallelizes autoregressive linking across all potential mentions.
Our model is >70 times faster and more accurate than the previous generative method.
 arXiv  Detail & Related papers  (2021-09-08T17:28:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.