Training Strategies for Improved Lip-reading
- URL: http://arxiv.org/abs/2209.01383v1
- Date: Sat, 3 Sep 2022 09:38:11 GMT
- Title: Training Strategies for Improved Lip-reading
- Authors: Pingchuan Ma, Yujiang Wang, Stavros Petridis, Jie Shen, Maja Pantic
- Abstract summary: We investigate the performance of state-of-the-art data augmentation approaches, temporal models and other training strategies.
A combination of all the methods results in a classification accuracy of 93.4%, which is an absolute improvement of 4.6% over the current state-of-the-art performance.
An error analysis of the various training strategies reveals that the performance improves by increasing the classification accuracy of hard-to-recognise words.
- Score: 61.661446956793604
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Several training strategies and temporal models have been recently proposed
for isolated word lip-reading in a series of independent works. However, the
potential of combining the best strategies and investigating the impact of each
of them has not been explored. In this paper, we systematically investigate the
performance of state-of-the-art data augmentation approaches, temporal models
and other training strategies, like self-distillation and using word boundary
indicators. Our results show that Time Masking (TM) is the most important
augmentation followed by mixup and Densely-Connected Temporal Convolutional
Networks (DC-TCN) are the best temporal model for lip-reading of isolated
words. Using self-distillation and word boundary indicators is also beneficial
but to a lesser extent. A combination of all the above methods results in a
classification accuracy of 93.4%, which is an absolute improvement of 4.6% over
the current state-of-the-art performance on the LRW dataset. The performance
can be further improved to 94.1% by pre-training on additional datasets. An
error analysis of the various training strategies reveals that the performance
improves by increasing the classification accuracy of hard-to-recognise words.
Related papers
- Guidelines for Augmentation Selection in Contrastive Learning for Time Series Classification [7.712601563682029]
We establish a principled framework for selecting augmentations based on dataset characteristics such as trend and seasonality.
We then evaluate the effectiveness of 8 different augmentations across 12 synthetic datasets and 6 real-world datasets.
Our proposed trend-seasonality-based augmentation recommendation algorithm can accurately identify the effective augmentations for a given time series dataset.
arXiv Detail & Related papers (2024-07-12T15:13:16Z) - Contrastive-Adversarial and Diffusion: Exploring pre-training and fine-tuning strategies for sulcal identification [3.0398616939692777]
Techniques like adversarial learning, contrastive learning, diffusion denoising learning, and ordinary reconstruction learning have become standard.
The study aims to elucidate the advantages of pre-training techniques and fine-tuning strategies to enhance the learning process of neural networks.
arXiv Detail & Related papers (2024-05-29T15:44:51Z) - Efficient Ensembles Improve Training Data Attribution [12.180392191924758]
Training data attribution methods aim to quantify the influence of individual data points on model predictions, with broad applications in data-centric AI.
Existing methods in this field, which can be categorized as retraining-based and gradient-based methods, have struggled with naive trade-off attribution efficacy.
Recent research has shown that augmenting gradient-based methods with ensembles of multiple independently trained models can achieve significantly better attribution.
arXiv Detail & Related papers (2024-05-27T15:58:34Z) - Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning [55.96599486604344]
We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process.
We use Monte Carlo Tree Search (MCTS) to iteratively collect preference data, utilizing its look-ahead ability to break down instance-level rewards into more granular step-level signals.
The proposed algorithm employs Direct Preference Optimization (DPO) to update the LLM policy using this newly generated step-level preference data.
arXiv Detail & Related papers (2024-05-01T11:10:24Z) - Robust Learning with Progressive Data Expansion Against Spurious
Correlation [65.83104529677234]
We study the learning process of a two-layer nonlinear convolutional neural network in the presence of spurious features.
Our analysis suggests that imbalanced data groups and easily learnable spurious features can lead to the dominance of spurious features during the learning process.
We propose a new training algorithm called PDE that efficiently enhances the model's robustness for a better worst-group performance.
arXiv Detail & Related papers (2023-06-08T05:44:06Z) - Boosting Visual-Language Models by Exploiting Hard Samples [126.35125029639168]
HELIP is a cost-effective strategy tailored to enhance the performance of existing CLIP models.
Our method allows for effortless integration with existing models' training pipelines.
On comprehensive benchmarks, HELIP consistently boosts existing models to achieve leading performance.
arXiv Detail & Related papers (2023-05-09T07:00:17Z) - Boost AI Power: Data Augmentation Strategies with unlabelled Data and
Conformal Prediction, a Case in Alternative Herbal Medicine Discrimination
with Electronic Nose [12.31253329379136]
Electronic nose proves its effectiveness in alternativeherbal medicine classification, but due to the supervised learn-costing nature, previous research relies on the labelled training data.
This study aims to improve classification accuracy via data augmentationstrategies.
arXiv Detail & Related papers (2021-02-05T10:25:36Z) - Learn an Effective Lip Reading Model without Pains [96.21025771586159]
Lip reading, also known as visual speech recognition, aims to recognize the speech content from videos by analyzing the lip dynamics.
Most existing methods obtained high performance by constructing a complex neural network.
We find that making proper use of these strategies could always bring exciting improvements without changing much of the model.
arXiv Detail & Related papers (2020-11-15T15:29:19Z) - A Simple but Tough-to-Beat Data Augmentation Approach for Natural
Language Understanding and Generation [53.8171136907856]
We introduce a set of simple yet effective data augmentation strategies dubbed cutoff.
cutoff relies on sampling consistency and thus adds little computational overhead.
cutoff consistently outperforms adversarial training and achieves state-of-the-art results on the IWSLT2014 German-English dataset.
arXiv Detail & Related papers (2020-09-29T07:08:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.