Pseudo-Convolutional Policy Gradient for Sequence-to-Sequence
Lip-Reading
- URL: http://arxiv.org/abs/2003.03983v1
- Date: Mon, 9 Mar 2020 09:12:26 GMT
- Title: Pseudo-Convolutional Policy Gradient for Sequence-to-Sequence
Lip-Reading
- Authors: Mingshuang Luo, Shuang Yang, Shiguang Shan, Xilin Chen
- Abstract summary: Lip-reading aims to infer the speech content from the lip movement sequence.
Traditional learning process of seq2seq models suffers from two problems.
We propose a novel pseudo-convolutional policy gradient (PCPG) based method to address these two problems.
- Score: 96.48553941812366
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Lip-reading aims to infer the speech content from the lip movement sequence
and can be seen as a typical sequence-to-sequence (seq2seq) problem which
translates the input image sequence of lip movements to the text sequence of
the speech content. However, the traditional learning process of seq2seq models
always suffers from two problems: the exposure bias resulted from the strategy
of "teacher-forcing", and the inconsistency between the discriminative
optimization target (usually the cross-entropy loss) and the final evaluation
metric (usually the character/word error rate). In this paper, we propose a
novel pseudo-convolutional policy gradient (PCPG) based method to address these
two problems. On the one hand, we introduce the evaluation metric (refers to
the character error rate in this paper) as a form of reward to optimize the
model together with the original discriminative target. On the other hand,
inspired by the local perception property of convolutional operation, we
perform a pseudo-convolutional operation on the reward and loss dimension, so
as to take more context around each time step into account to generate a robust
reward and loss for the whole optimization. Finally, we perform a thorough
comparison and evaluation on both the word-level and sentence-level benchmarks.
The results show a significant improvement over other related methods, and
report either a new state-of-the-art performance or a competitive accuracy on
all these challenging benchmarks, which clearly proves the advantages of our
approach.
Related papers
- Generalization bounds for regression and classification on adaptive covering input domains [1.4141453107129398]
We focus on the generalization bound, which serves as an upper limit for the generalization error.
In the case of classification tasks, we treat the target function as a one-hot, a piece-wise constant function, and employ 0/1 loss for error measurement.
arXiv Detail & Related papers (2024-07-29T05:40:08Z) - Non-Autoregressive Sentence Ordering [22.45972496989434]
We propose a novel Non-Autoregressive Ordering Network, dubbed textitNAON, which explores bilateral dependencies between sentences and predicts the sentence for each position in parallel.
We conduct extensive experiments on several common-used datasets and the experimental results show that our method outperforms all the autoregressive approaches.
arXiv Detail & Related papers (2023-10-19T10:57:51Z) - Mutual Exclusivity Training and Primitive Augmentation to Induce
Compositionality [84.94877848357896]
Recent datasets expose the lack of the systematic generalization ability in standard sequence-to-sequence models.
We analyze this behavior of seq2seq models and identify two contributing factors: a lack of mutual exclusivity bias and the tendency to memorize whole examples.
We show substantial empirical improvements using standard sequence-to-sequence models on two widely-used compositionality datasets.
arXiv Detail & Related papers (2022-11-28T17:36:41Z) - Hierarchical Phrase-based Sequence-to-Sequence Learning [94.10257313923478]
We describe a neural transducer that maintains the flexibility of standard sequence-to-sequence (seq2seq) models while incorporating hierarchical phrases as a source of inductive bias during training and as explicit constraints during inference.
Our approach trains two models: a discriminative derivation based on a bracketing grammar whose tree hierarchically aligns source and target phrases, and a neural seq2seq model that learns to translate the aligned phrases one-by-one.
arXiv Detail & Related papers (2022-11-15T05:22:40Z) - Deep Probabilistic Graph Matching [72.6690550634166]
We propose a deep learning-based graph matching framework that works for the original QAP without compromising on the matching constraints.
The proposed method is evaluated on three popularly tested benchmarks (Pascal VOC, Willow Object and SPair-71k) and it outperforms all previous state-of-the-arts on all benchmarks.
arXiv Detail & Related papers (2022-01-05T13:37:27Z) - Meta-Regularization: An Approach to Adaptive Choice of the Learning Rate
in Gradient Descent [20.47598828422897]
We propose textit-Meta-Regularization, a novel approach for the adaptive choice of the learning rate in first-order descent methods.
Our approach modifies the objective function by adding a regularization term, and casts the joint process parameters.
arXiv Detail & Related papers (2021-04-12T13:13:34Z) - Simple and optimal methods for stochastic variational inequalities, II:
Markovian noise and policy evaluation in reinforcement learning [9.359939442911127]
This paper focuses on resetting variational inequalities (VI) under Markovian noise.
A prominent application of our algorithmic developments is the policy evaluation problem in reinforcement learning.
arXiv Detail & Related papers (2020-11-15T04:05:22Z) - Fact-aware Sentence Split and Rephrase with Permutation Invariant
Training [93.66323661321113]
Sentence Split and Rephrase aims to break down a complex sentence into several simple sentences with its meaning preserved.
Previous studies tend to address the issue by seq2seq learning from parallel sentence pairs.
We introduce Permutation Training to verifies the effects of order variance in seq2seq learning for this task.
arXiv Detail & Related papers (2020-01-16T07:30:19Z) - Adaptive Correlated Monte Carlo for Contextual Categorical Sequence
Generation [77.7420231319632]
We adapt contextual generation of categorical sequences to a policy gradient estimator, which evaluates a set of correlated Monte Carlo (MC) rollouts for variance control.
We also demonstrate the use of correlated MC rollouts for binary-tree softmax models, which reduce the high generation cost in large vocabulary scenarios.
arXiv Detail & Related papers (2019-12-31T03:01:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.