Embodied Self-supervised Learning by Coordinated Sampling and Training
- URL: http://arxiv.org/abs/2006.13350v2
- Date: Sun, 16 Jan 2022 09:33:33 GMT
- Title: Embodied Self-supervised Learning by Coordinated Sampling and Training
- Authors: Yifan Sun and Xihong Wu
- Abstract summary: We propose a novel self-supervised approach to solve inverse problems by employing the corresponding physical forward process.
The proposed approach works in an analysis-by-synthesis manner to learn an inference network by iteratively sampling and training.
We prove the feasibility of the proposed method by tackling the acoustic-to-articulatory inversion problem to infer articulatory information from speech.
- Score: 14.107020105091662
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised learning can significantly improve the performance of
downstream tasks, however, the dimensions of learned representations normally
lack explicit physical meanings. In this work, we propose a novel
self-supervised approach to solve inverse problems by employing the
corresponding physical forward process so that the learned representations can
have explicit physical meanings. The proposed approach works in an
analysis-by-synthesis manner to learn an inference network by iteratively
sampling and training. At the sampling step, given observed data, the inference
network is used to approximate the intractable posterior, from which we sample
input parameters and feed them to a physical process to generate data in the
observational space; At the training step, the same network is optimized with
the sampled paired data. We prove the feasibility of the proposed method by
tackling the acoustic-to-articulatory inversion problem to infer articulatory
information from speech. Given an articulatory synthesizer, an inference model
can be trained completely from scratch with random initialization. Our
experiments demonstrate that the proposed method can converge steadily and the
network learns to control the articulatory synthesizer to speak like a human.
We also demonstrate that trained models can generalize well to unseen speakers
or even new languages, and performance can be further improved through
self-adaptation.
Related papers
- Transferable Post-training via Inverse Value Learning [83.75002867411263]
We propose modeling changes at the logits level during post-training using a separate neural network (i.e., the value network)
After training this network on a small base model using demonstrations, this network can be seamlessly integrated with other pre-trained models during inference.
We demonstrate that the resulting value network has broad transferability across pre-trained models of different parameter sizes.
arXiv Detail & Related papers (2024-10-28T13:48:43Z) - A distributional simplicity bias in the learning dynamics of transformers [50.91742043564049]
We show that transformers, trained on natural language data, also display a simplicity bias.
Specifically, they sequentially learn many-body interactions among input tokens, reaching a saturation point in the prediction error for low-degree interactions.
This approach opens up the possibilities of studying how interactions of different orders in the data affect learning, in natural language processing and beyond.
arXiv Detail & Related papers (2024-10-25T15:39:34Z) - Demolition and Reinforcement of Memories in Spin-Glass-like Neural
Networks [0.0]
The aim of this thesis is to understand the effectiveness of Unlearning in both associative memory models and generative models.
The selection of structured data enables an associative memory model to retrieve concepts as attractors of a neural dynamics with considerable basins of attraction.
A novel regularization technique for Boltzmann Machines is presented, proving to outperform previously developed methods in learning hidden probability distributions from data-sets.
arXiv Detail & Related papers (2024-03-04T23:12:42Z) - Inverse Dynamics Pretraining Learns Good Representations for Multitask
Imitation [66.86987509942607]
We evaluate how such a paradigm should be done in imitation learning.
We consider a setting where the pretraining corpus consists of multitask demonstrations.
We argue that inverse dynamics modeling is well-suited to this setting.
arXiv Detail & Related papers (2023-05-26T14:40:46Z) - Phonetic and Prosody-aware Self-supervised Learning Approach for
Non-native Fluency Scoring [13.817385516193445]
Speech fluency/disfluency can be evaluated by analyzing a range of phonetic and prosodic features.
Deep neural networks are commonly trained to map fluency-related features into the human scores.
We introduce a self-supervised learning (SSL) approach that takes into account phonetic and prosody awareness for fluency scoring.
arXiv Detail & Related papers (2023-05-19T05:39:41Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - Self-Adaptive Training: Bridging the Supervised and Self-Supervised
Learning [16.765461276790944]
Self-adaptive training is a unified training algorithm that dynamically calibrates and enhances training process by model predictions without incurring extra computational cost.
We analyze the training dynamics of deep networks on training data corrupted by, e.g., random noise and adversarial examples.
Our analysis shows that model predictions are able to magnify useful underlying information in data and this phenomenon occurs broadly even in the absence of emphany label information.
arXiv Detail & Related papers (2021-01-21T17:17:30Z) - Local and non-local dependency learning and emergence of rule-like
representations in speech data by Deep Convolutional Generative Adversarial
Networks [0.0]
This paper argues that training GANs on local and non-local dependencies in speech data offers insights into how deep neural networks discretize continuous data.
arXiv Detail & Related papers (2020-09-27T00:02:34Z) - Automatic Recall Machines: Internal Replay, Continual Learning and the
Brain [104.38824285741248]
Replay in neural networks involves training on sequential data with memorized samples, which counteracts forgetting of previous behavior caused by non-stationarity.
We present a method where these auxiliary samples are generated on the fly, given only the model that is being trained for the assessed objective.
Instead the implicit memory of learned samples within the assessed model itself is exploited.
arXiv Detail & Related papers (2020-06-22T15:07:06Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.