Guided Generative Adversarial Neural Network for Representation Learning
and High Fidelity Audio Generation using Fewer Labelled Audio Data
- URL: http://arxiv.org/abs/2003.02836v2
- Date: Mon, 1 Jun 2020 12:05:25 GMT
- Title: Guided Generative Adversarial Neural Network for Representation Learning
and High Fidelity Audio Generation using Fewer Labelled Audio Data
- Authors: Kazi Nazmul Haque, Rajib Rana, John H. L. Hansen, Bj\"orn Schuller
- Abstract summary: Recent improvements in Generative Adversarial Neural Networks (GANs) have shown their ability to generate higher quality samples.
Most of the representation learning methods based on GANs learn representations ignoring their post-use scenario.
We propose a novel GAN framework: Guided Generative Neural Network (GGAN), which guides a GAN to focus on learning desired representations.
- Score: 31.00018800567942
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent improvements in Generative Adversarial Neural Networks (GANs) have
shown their ability to generate higher quality samples as well as to learn good
representations for transfer learning. Most of the representation learning
methods based on GANs learn representations ignoring their post-use scenario,
which can lead to increased generalisation ability. However, the model can
become redundant if it is intended for a specific task. For example, assume we
have a vast unlabelled audio dataset, and we want to learn a representation
from this dataset so that it can be used to improve the emotion recognition
performance of a small labelled audio dataset. During the representation
learning training, if the model does not know the post emotion recognition
task, it can completely ignore emotion-related characteristics in the learnt
representation. This is a fundamental challenge for any unsupervised
representation learning model. In this paper, we aim to address this challenge
by proposing a novel GAN framework: Guided Generative Neural Network (GGAN),
which guides a GAN to focus on learning desired representations and generating
superior quality samples for audio data leveraging fewer labelled samples.
Experimental results show that using a very small amount of labelled data as
guidance, a GGAN learns significantly better representations.
Related papers
- Learning General Audio Representations with Large-Scale Training of
Patchout Audio Transformers [6.002503434201551]
We study the use of audio transformers trained on large-scale datasets to learn general-purpose representations.
Our results show that representations extracted by audio transformers outperform CNN representations.
arXiv Detail & Related papers (2022-11-25T08:39:12Z) - High Fidelity Visualization of What Your Self-Supervised Representation
Knows About [22.982471878833362]
In this work, we showcase the use of a conditional diffusion based generative model (RCDM) to visualize representations learned with self-supervised models.
We demonstrate how this model's generation quality is on par with state-of-the-art generative models while being faithful to the representation used as conditioning.
arXiv Detail & Related papers (2021-12-16T19:23:33Z) - Improved Speech Emotion Recognition using Transfer Learning and
Spectrogram Augmentation [56.264157127549446]
Speech emotion recognition (SER) is a challenging task that plays a crucial role in natural human-computer interaction.
One of the main challenges in SER is data scarcity.
We propose a transfer learning strategy combined with spectrogram augmentation.
arXiv Detail & Related papers (2021-08-05T10:39:39Z) - Reasoning-Modulated Representations [85.08205744191078]
We study a common setting where our task is not purely opaque.
Our approach paves the way for a new class of data-efficient representation learning.
arXiv Detail & Related papers (2021-07-19T13:57:13Z) - Curious Representation Learning for Embodied Intelligence [81.21764276106924]
Self-supervised representation learning has achieved remarkable success in recent years.
Yet to build truly intelligent agents, we must construct representation learning algorithms that can learn from environments.
We propose a framework, curious representation learning, which jointly learns a reinforcement learning policy and a visual representation model.
arXiv Detail & Related papers (2021-05-03T17:59:20Z) - Relation-Guided Representation Learning [53.60351496449232]
We propose a new representation learning method that explicitly models and leverages sample relations.
Our framework well preserves the relations between samples.
By seeking to embed samples into subspace, we show that our method can address the large-scale and out-of-sample problem.
arXiv Detail & Related papers (2020-07-11T10:57:45Z) - COALA: Co-Aligned Autoencoders for Learning Semantically Enriched Audio
Representations [32.456824945999465]
We propose a method for learning audio representations, aligning the learned latent representations of audio and associated tags.
We evaluate the quality of our embedding model, measuring its performance as a feature extractor on three different tasks.
arXiv Detail & Related papers (2020-06-15T13:17:18Z) - High-Fidelity Audio Generation and Representation Learning with Guided
Adversarial Autoencoder [2.6770746621108654]
We propose a new autoencoder based model named "Guided Adversarial Autoencoder (GAAE)"
Our proposed model can generate audio with superior quality, which is indistinguishable from the real audio samples.
arXiv Detail & Related papers (2020-06-01T12:19:32Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z) - Laplacian Denoising Autoencoder [114.21219514831343]
We propose to learn data representations with a novel type of denoising autoencoder.
The noisy input data is generated by corrupting latent clean data in the gradient domain.
Experiments on several visual benchmarks demonstrate that better representations can be learned with the proposed approach.
arXiv Detail & Related papers (2020-03-30T16:52:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.