Towards Practical Lipreading with Distilled and Efficient Models
- URL: http://arxiv.org/abs/2007.06504v3
- Date: Wed, 2 Jun 2021 09:02:09 GMT
- Title: Towards Practical Lipreading with Distilled and Efficient Models
- Authors: Pingchuan Ma, Brais Martinez, Stavros Petridis, Maja Pantic
- Abstract summary: Lipreading has witnessed a lot of progress due to the resurgence of neural networks.
Recent works have placed emphasis on aspects such as improving performance by finding the optimal architecture or improving generalization.
There is still a significant gap between the current methodologies and the requirements for an effective deployment of lipreading in practical scenarios.
We propose a series of innovations that significantly bridge that gap: first, we raise the state-of-the-art performance by a wide margin on LRW and LRW-1000 to 88.5% and 46.6%, respectively using self-distillation.
- Score: 57.41253104365274
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Lipreading has witnessed a lot of progress due to the resurgence of neural
networks. Recent works have placed emphasis on aspects such as improving
performance by finding the optimal architecture or improving generalization.
However, there is still a significant gap between the current methodologies and
the requirements for an effective deployment of lipreading in practical
scenarios. In this work, we propose a series of innovations that significantly
bridge that gap: first, we raise the state-of-the-art performance by a wide
margin on LRW and LRW-1000 to 88.5% and 46.6%, respectively using
self-distillation. Secondly, we propose a series of architectural changes,
including a novel Depthwise Separable Temporal Convolutional Network (DS-TCN)
head, that slashes the computational cost to a fraction of the (already quite
efficient) original model. Thirdly, we show that knowledge distillation is a
very effective tool for recovering performance of the lightweight models. This
results in a range of models with different accuracy-efficiency trade-offs.
However, our most promising lightweight models are on par with the current
state-of-the-art while showing a reduction of 8.2x and 3.9x in terms of
computational cost and number of parameters, respectively, which we hope will
enable the deployment of lipreading models in practical applications.
Related papers
- HyCubE: Efficient Knowledge Hypergraph 3D Circular Convolutional Embedding [21.479738859698344]
It is desirable and challenging for knowledge hypergraph embedding to reach a trade-off between model effectiveness and efficiency.
We propose an end-to-end efficient knowledge hypergraph embedding model, HyCubE, which designs a novel 3D circular convolutional neural network.
Our proposed model consistently outperforms state-of-the-art baselines, with an average improvement of 8.22% and a maximum improvement of 33.82%.
arXiv Detail & Related papers (2024-02-14T06:05:37Z) - A-SDM: Accelerating Stable Diffusion through Redundancy Removal and
Performance Optimization [54.113083217869516]
In this work, we first explore the computational redundancy part of the network.
We then prune the redundancy blocks of the model and maintain the network performance.
Thirdly, we propose a global-regional interactive (GRI) attention to speed up the computationally intensive attention part.
arXiv Detail & Related papers (2023-12-24T15:37:47Z) - Towards Compute-Optimal Transfer Learning [82.88829463290041]
We argue that zero-shot structured pruning of pretrained models allows them to increase compute efficiency with minimal reduction in performance.
Our results show that pruning convolutional filters of pretrained models can lead to more than 20% performance improvement in low computational regimes.
arXiv Detail & Related papers (2023-04-25T21:49:09Z) - Powerpropagation: A sparsity inducing weight reparameterisation [65.85142037667065]
We introduce Powerpropagation, a new weight- parameterisation for neural networks that leads to inherently sparse models.
Models trained in this manner exhibit similar performance, but have a distribution with markedly higher density at zero, allowing more parameters to be pruned safely.
Here, we combine Powerpropagation with a traditional weight-pruning technique as well as recent state-of-the-art sparse-to-sparse algorithms, showing superior performance on the ImageNet benchmark.
arXiv Detail & Related papers (2021-10-01T10:03:57Z) - Knowledge distillation: A good teacher is patient and consistent [71.14922743774864]
There is a growing discrepancy in computer vision between large-scale models that achieve state-of-the-art performance and models that are affordable in practical applications.
We identify certain implicit design choices, which may drastically affect the effectiveness of distillation.
We obtain a state-of-the-art ResNet-50 model for ImageNet, which achieves 82.8% top-1 accuracy.
arXiv Detail & Related papers (2021-06-09T17:20:40Z) - Top-KAST: Top-K Always Sparse Training [50.05611544535801]
We propose Top-KAST, a method that preserves constant sparsity throughout training.
We show that it performs comparably to or better than previous works when training models on the established ImageNet benchmark.
In addition to our ImageNet results, we also demonstrate our approach in the domain of language modeling.
arXiv Detail & Related papers (2021-06-07T11:13:05Z) - AttentionLite: Towards Efficient Self-Attention Models for Vision [9.957033392865982]
We propose a novel framework for producing a class of parameter and compute efficient models called AttentionLitesuitable for resource-constrained applications.
We can simultaneously distill knowledge from a compute-heavy teacher while also pruning the student model in a single pass of training.
arXiv Detail & Related papers (2020-12-21T17:54:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.