Towards Practical Lipreading with Distilled and Efficient Models
- URL: http://arxiv.org/abs/2007.06504v3
- Date: Wed, 2 Jun 2021 09:02:09 GMT
- Title: Towards Practical Lipreading with Distilled and Efficient Models
- Authors: Pingchuan Ma, Brais Martinez, Stavros Petridis, Maja Pantic
- Abstract summary: Lipreading has witnessed a lot of progress due to the resurgence of neural networks.
Recent works have placed emphasis on aspects such as improving performance by finding the optimal architecture or improving generalization.
There is still a significant gap between the current methodologies and the requirements for an effective deployment of lipreading in practical scenarios.
We propose a series of innovations that significantly bridge that gap: first, we raise the state-of-the-art performance by a wide margin on LRW and LRW-1000 to 88.5% and 46.6%, respectively using self-distillation.
- Score: 57.41253104365274
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Lipreading has witnessed a lot of progress due to the resurgence of neural
networks. Recent works have placed emphasis on aspects such as improving
performance by finding the optimal architecture or improving generalization.
However, there is still a significant gap between the current methodologies and
the requirements for an effective deployment of lipreading in practical
scenarios. In this work, we propose a series of innovations that significantly
bridge that gap: first, we raise the state-of-the-art performance by a wide
margin on LRW and LRW-1000 to 88.5% and 46.6%, respectively using
self-distillation. Secondly, we propose a series of architectural changes,
including a novel Depthwise Separable Temporal Convolutional Network (DS-TCN)
head, that slashes the computational cost to a fraction of the (already quite
efficient) original model. Thirdly, we show that knowledge distillation is a
very effective tool for recovering performance of the lightweight models. This
results in a range of models with different accuracy-efficiency trade-offs.
However, our most promising lightweight models are on par with the current
state-of-the-art while showing a reduction of 8.2x and 3.9x in terms of
computational cost and number of parameters, respectively, which we hope will
enable the deployment of lipreading models in practical applications.
Related papers
- Scalable Model Merging with Progressive Layer-wise Distillation [17.521794641817642]
We introduce a novel few-shot merging algorithm, ProDistill (Progressive Layer-wise Distillation)
We show that ProDistill achieves state-of-the-art performance, with up to 6.14% and 6.61% improvements in vision and NLU tasks.
arXiv Detail & Related papers (2025-02-18T10:15:18Z) - Numerical Pruning for Efficient Autoregressive Models [87.56342118369123]
This paper focuses on compressing decoder-only transformer-based autoregressive models through structural weight pruning.
Specifically, we propose a training-free pruning method that calculates a numerical score with Newton's method for the Attention and modules, respectively.
To verify the effectiveness of our method, we provide both theoretical support and extensive experiments.
arXiv Detail & Related papers (2024-12-17T01:09:23Z) - Effortless Efficiency: Low-Cost Pruning of Diffusion Models [29.821803522137913]
We propose a model-agnostic structural pruning framework for diffusion models that learns a differentiable mask to sparsify the model.
To ensure effective pruning that preserves the quality of the final denoised latent, we design a novel end-to-end pruning objective that spans the entire diffusion process.
Results on state-of-the-art U-Net diffusion models SDXL and diffusion transformers (FLUX) demonstrate that our method can effectively prune up to 20% parameters with minimal perceptible performance degradation.
arXiv Detail & Related papers (2024-12-03T21:37:50Z) - HyCubE: Efficient Knowledge Hypergraph 3D Circular Convolutional Embedding [21.479738859698344]
It is desirable and challenging for knowledge hypergraph embedding to reach a trade-off between model effectiveness and efficiency.
We propose an end-to-end efficient knowledge hypergraph embedding model, HyCubE, which designs a novel 3D circular convolutional neural network.
Our proposed model consistently outperforms state-of-the-art baselines, with an average improvement of 8.22% and a maximum improvement of 33.82%.
arXiv Detail & Related papers (2024-02-14T06:05:37Z) - Towards Compute-Optimal Transfer Learning [82.88829463290041]
We argue that zero-shot structured pruning of pretrained models allows them to increase compute efficiency with minimal reduction in performance.
Our results show that pruning convolutional filters of pretrained models can lead to more than 20% performance improvement in low computational regimes.
arXiv Detail & Related papers (2023-04-25T21:49:09Z) - Powerpropagation: A sparsity inducing weight reparameterisation [65.85142037667065]
We introduce Powerpropagation, a new weight- parameterisation for neural networks that leads to inherently sparse models.
Models trained in this manner exhibit similar performance, but have a distribution with markedly higher density at zero, allowing more parameters to be pruned safely.
Here, we combine Powerpropagation with a traditional weight-pruning technique as well as recent state-of-the-art sparse-to-sparse algorithms, showing superior performance on the ImageNet benchmark.
arXiv Detail & Related papers (2021-10-01T10:03:57Z) - Knowledge distillation: A good teacher is patient and consistent [71.14922743774864]
There is a growing discrepancy in computer vision between large-scale models that achieve state-of-the-art performance and models that are affordable in practical applications.
We identify certain implicit design choices, which may drastically affect the effectiveness of distillation.
We obtain a state-of-the-art ResNet-50 model for ImageNet, which achieves 82.8% top-1 accuracy.
arXiv Detail & Related papers (2021-06-09T17:20:40Z) - Top-KAST: Top-K Always Sparse Training [50.05611544535801]
We propose Top-KAST, a method that preserves constant sparsity throughout training.
We show that it performs comparably to or better than previous works when training models on the established ImageNet benchmark.
In addition to our ImageNet results, we also demonstrate our approach in the domain of language modeling.
arXiv Detail & Related papers (2021-06-07T11:13:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.