Improved Speech Representations with Multi-Target Autoregressive
Predictive Coding
- URL: http://arxiv.org/abs/2004.05274v1
- Date: Sat, 11 Apr 2020 01:09:36 GMT
- Title: Improved Speech Representations with Multi-Target Autoregressive
Predictive Coding
- Authors: Yu-An Chung, James Glass
- Abstract summary: We extend the hypothesis that hidden states that can accurately predict future frames are a useful representation for many downstream tasks.
We propose an auxiliary objective that serves as a regularization to improve generalization of the future frame prediction task.
- Score: 23.424410568555547
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training objectives based on predictive coding have recently been shown to be
very effective at learning meaningful representations from unlabeled speech.
One example is Autoregressive Predictive Coding (Chung et al., 2019), which
trains an autoregressive RNN to generate an unseen future frame given a context
such as recent past frames. The basic hypothesis of these approaches is that
hidden states that can accurately predict future frames are a useful
representation for many downstream tasks. In this paper we extend this
hypothesis and aim to enrich the information encoded in the hidden states by
training the model to make more accurate future predictions. We propose an
auxiliary objective that serves as a regularization to improve generalization
of the future frame prediction task. Experimental results on phonetic
classification, speech recognition, and speech translation not only support the
hypothesis, but also demonstrate the effectiveness of our approach in learning
representations that contain richer phonetic content.
Related papers
- The Power of Next-Frame Prediction for Learning Physical Laws [5.624870417352306]
Next-frame prediction is a useful and powerful method for modelling and understanding the dynamics of video data.
We introduce six diagnostic simulation video datasets derived from fundamental physical laws created by varying physical constants such as gravity and mass.
We find that the generative training phase alone induces a model state that can predict physical constants significantly better than that of a random model.
arXiv Detail & Related papers (2024-05-21T17:55:54Z) - Understanding Self-Predictive Learning for Reinforcement Learning [61.62067048348786]
We study the learning dynamics of self-predictive learning for reinforcement learning.
We propose a novel self-predictive algorithm that learns two representations simultaneously.
arXiv Detail & Related papers (2022-12-06T20:43:37Z) - Localization vs. Semantics: Visual Representations in Unimodal and
Multimodal Models [57.08925810659545]
We conduct a comparative analysis of the visual representations in existing vision-and-language models and vision-only models.
Our empirical observations suggest that vision-and-language models are better at label prediction tasks.
We hope our study sheds light on the role of language in visual learning, and serves as an empirical guide for various pretrained models.
arXiv Detail & Related papers (2022-12-01T05:00:18Z) - Sentence Representation Learning with Generative Objective rather than
Contrastive Objective [86.01683892956144]
We propose a novel generative self-supervised learning objective based on phrase reconstruction.
Our generative learning achieves powerful enough performance improvement and outperforms the current state-of-the-art contrastive methods.
arXiv Detail & Related papers (2022-10-16T07:47:46Z) - Few-shot Subgoal Planning with Language Models [58.11102061150875]
We show that language priors encoded in pre-trained language models allow us to infer fine-grained subgoal sequences.
In contrast to recent methods which make strong assumptions about subgoal supervision, our experiments show that language models can infer detailed subgoal sequences without any fine-tuning.
arXiv Detail & Related papers (2022-05-28T01:03:30Z) - Enhancing Speech Recognition Decoding via Layer Aggregation [7.056222499095849]
We show that logits predicted using the top layers may hamper beam search from achieving optimal results.
We propose a prediction method that aggregates the top M layers, potentially leveraging useful information encoded in intermediate layers and relaxing model confidence.
arXiv Detail & Related papers (2022-03-21T20:28:06Z) - Probing as Quantifying the Inductive Bias of Pre-trained Representations [99.93552997506438]
We present a novel framework for probing where the goal is to evaluate the inductive bias of representations for a particular task.
We apply our framework to a series of token-, arc-, and sentence-level tasks.
arXiv Detail & Related papers (2021-10-15T22:01:16Z) - Augmenting BERT-style Models with Predictive Coding to Improve
Discourse-level Representations [20.855686009404703]
We propose to use ideas from predictive coding theory to augment BERT-style language models with a mechanism that allows them to learn discourse-level representations.
Our proposed approach is able to predict future sentences using explicit top-down connections that operate at the intermediate layers of the network.
arXiv Detail & Related papers (2021-09-10T00:45:28Z) - Adaptive Future Frame Prediction with Ensemble Network [15.19884183320726]
We propose an adaptive update framework for the future frame prediction task.
The proposed framework consists of a pre-trained prediction network, a continuous-updating prediction network, and a weight estimation network.
Our approach outperforms existing methods especially for dynamically changing scenes.
arXiv Detail & Related papers (2020-11-13T07:08:06Z) - Latent Representation Prediction Networks [0.0]
We find this principle of learning representations unsatisfying.
We propose a new way of jointly learning this representation along with the prediction function.
Our approach is shown to be more sample-efficient than standard reinforcement learning methods.
arXiv Detail & Related papers (2020-09-20T14:26:03Z) - Video Prediction via Example Guidance [156.08546987158616]
In video prediction tasks, one major challenge is to capture the multi-modal nature of future contents and dynamics.
In this work, we propose a simple yet effective framework that can efficiently predict plausible future states.
arXiv Detail & Related papers (2020-07-03T14:57:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.