PIMNet: A Parallel, Iterative and Mimicking Network for Scene Text
Recognition
- URL: http://arxiv.org/abs/2109.04145v1
- Date: Thu, 9 Sep 2021 10:11:07 GMT
- Title: PIMNet: A Parallel, Iterative and Mimicking Network for Scene Text
Recognition
- Authors: Zhi Qiao, Yu Zhou, Jin Wei, Wei Wang, Yuan Zhang, Ning Jiang, Hongbin
Wang, Weiping Wang
- Abstract summary: We propose a Parallel, Iterative and Mimicking Network (PIMNet) to balance accuracy and efficiency.
PIMNet adopts a parallel attention mechanism to predict the text faster and an iterative generation mechanism to make the predictions more accurate.
- Score: 16.976881696357275
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Nowadays, scene text recognition has attracted more and more attention due to
its various applications. Most state-of-the-art methods adopt an
encoder-decoder framework with attention mechanism, which generates text
autoregressively from left to right. Despite the convincing performance, the
speed is limited because of the one-by-one decoding strategy. As opposed to
autoregressive models, non-autoregressive models predict the results in
parallel with a much shorter inference time, but the accuracy falls behind the
autoregressive counterpart considerably. In this paper, we propose a Parallel,
Iterative and Mimicking Network (PIMNet) to balance accuracy and efficiency.
Specifically, PIMNet adopts a parallel attention mechanism to predict the text
faster and an iterative generation mechanism to make the predictions more
accurate. In each iteration, the context information is fully explored. To
improve learning of the hidden layer, we exploit the mimicking learning in the
training phase, where an additional autoregressive decoder is adopted and the
parallel decoder mimics the autoregressive decoder with fitting outputs of the
hidden layer. With the shared backbone between the two decoders, the proposed
PIMNet can be trained end-to-end without pre-training. During inference, the
branch of the autoregressive decoder is removed for a faster speed. Extensive
experiments on public benchmarks demonstrate the effectiveness and efficiency
of PIMNet. Our code will be available at https://github.com/Pay20Y/PIMNet.
Related papers
- FIRP: Faster LLM inference via future intermediate representation prediction [54.897493351694195]
FIRP generates multiple tokens instead of one at each decoding step.
We conduct extensive experiments, showing a speedup ratio of 1.9x-3x in several models and datasets.
arXiv Detail & Related papers (2024-10-27T15:53:49Z) - Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration [54.897493351694195]
We propose a novel parallel decoding approach, namely textithidden transfer, which decodes multiple successive tokens simultaneously in a single forward pass.
In terms of acceleration metrics, we outperform all the single-model acceleration techniques, including Medusa and Self-Speculative decoding.
arXiv Detail & Related papers (2024-04-18T09:17:06Z) - T4P: Test-Time Training of Trajectory Prediction via Masked Autoencoder and Actor-specific Token Memory [39.021321011792786]
Trajectory prediction is a challenging problem that requires considering interactions among multiple actors.
Data-driven approaches have been used to address this complex problem, but they suffer from unreliable predictions under distribution shifts during test time.
We propose several online learning methods using regression loss from the ground truth of observed data.
Our method surpasses the performance of existing state-of-the-art online learning methods in terms of both prediction accuracy and computational efficiency.
arXiv Detail & Related papers (2024-03-15T06:47:14Z) - Chimera: A Lossless Decoding Method for Accelerating Large Language Models Inference by Fusing all Tokens [15.566726645722657]
We propose a novel framework specifically designed for speculative sampling.
Within this framework, we introduce a lightweight draft model that effectively utilizes previously generated tokens to predict subsequent words.
We demonstrate impressive results, achieving an average latency speedup ratio of 2.7x compared to the vanilla auto-regressive decoding approach.
arXiv Detail & Related papers (2024-02-24T08:10:39Z) - IPAD: Iterative, Parallel, and Diffusion-based Network for Scene Text Recognition [5.525052547053668]
Scene text recognition has attracted more and more attention due to its diverse applications.
Most state-of-the-art methods adopt an encoder-decoder framework with the attention mechanism, autoregressively generating text from left to right.
We propose an alternative solution, using a parallel and iterative decoder that adopts an easy-first decoding strategy.
arXiv Detail & Related papers (2023-12-19T08:03:19Z) - ReBotNet: Fast Real-time Video Enhancement [59.08038313427057]
Most restoration networks are slow, have high computational bottleneck, and can't be used for real-time video enhancement.
In this work, we design an efficient and fast framework to perform real-time enhancement for practical use-cases like live video calls and video streams.
To evaluate our method, we emulate two new datasets that real-world video call and streaming scenarios, and show extensive results on multiple datasets where ReBotNet outperforms existing approaches with lower computations, reduced memory requirements, and faster inference time.
arXiv Detail & Related papers (2023-03-23T17:58:05Z) - LegoNet: A Fast and Exact Unlearning Architecture [59.49058450583149]
Machine unlearning aims to erase the impact of specific training samples upon deleted requests from a trained model.
We present a novel network, namely textitLegoNet, which adopts the framework of fixed encoder + multiple adapters''
We show that LegoNet accomplishes fast and exact unlearning while maintaining acceptable performance, synthetically outperforming unlearning baselines.
arXiv Detail & Related papers (2022-10-28T09:53:05Z) - Streaming End-to-End ASR based on Blockwise Non-Autoregressive Models [57.20432226304683]
Non-autoregressive (NAR) modeling has gained more and more attention in speech processing.
We propose a novel end-to-end streaming NAR speech recognition system.
We show that the proposed method improves online ASR recognition in low latency conditions.
arXiv Detail & Related papers (2021-07-20T11:42:26Z) - Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine
Translation [78.51887060865273]
We show that a single-layer autoregressive decoder can substantially outperform strong non-autoregressive models with comparable inference speed.
Our results establish a new protocol for future research toward fast, accurate machine translation.
arXiv Detail & Related papers (2020-06-18T09:06:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.