High-Fidelity Audio Generation and Representation Learning with Guided
Adversarial Autoencoder
- URL: http://arxiv.org/abs/2006.00877v2
- Date: Sat, 17 Oct 2020 12:53:36 GMT
- Title: High-Fidelity Audio Generation and Representation Learning with Guided
Adversarial Autoencoder
- Authors: Kazi Nazmul Haque, Rajib Rana, Bj\"orn W Schuller
- Abstract summary: We propose a new autoencoder based model named "Guided Adversarial Autoencoder (GAAE)"
Our proposed model can generate audio with superior quality, which is indistinguishable from the real audio samples.
- Score: 2.6770746621108654
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unsupervised disentangled representation learning from the unlabelled audio
data, and high fidelity audio generation have become two linchpins in the
machine learning research fields. However, the representation learned from an
unsupervised setting does not guarantee its' usability for any downstream task
at hand, which can be a wastage of the resources, if the training was conducted
for that particular posterior job. Also, during the representation learning, if
the model is highly biased towards the downstream task, it losses its
generalisation capability which directly benefits the downstream job but the
ability to scale it to other related task is lost. Therefore, to fill this gap,
we propose a new autoencoder based model named "Guided Adversarial Autoencoder
(GAAE)", which can learn both post-task-specific representations and the
general representation capturing the factors of variation in the training data
leveraging a small percentage of labelled samples; thus, makes it suitable for
future related tasks. Furthermore, our proposed model can generate audio with
superior quality, which is indistinguishable from the real audio samples.
Hence, with the extensive experimental results, we have demonstrated that by
harnessing the power of the high-fidelity audio generation, the proposed GAAE
model can learn powerful representation from unlabelled dataset leveraging a
fewer percentage of labelled data as supervision/guidance.
Related papers
- Training-Free Deepfake Voice Recognition by Leveraging Large-Scale Pre-Trained Models [52.04189118767758]
Generalization is a main issue for current audio deepfake detectors.
In this paper we study the potential of large-scale pre-trained models for audio deepfake detection.
arXiv Detail & Related papers (2024-05-03T15:27:11Z) - Improving a Named Entity Recognizer Trained on Noisy Data with a Few
Clean Instances [55.37242480995541]
We propose to denoise noisy NER data with guidance from a small set of clean instances.
Along with the main NER model we train a discriminator model and use its outputs to recalibrate the sample weights.
Results on public crowdsourcing and distant supervision datasets show that the proposed method can consistently improve performance with a small guidance set.
arXiv Detail & Related papers (2023-10-25T17:23:37Z) - Learning General Audio Representations with Large-Scale Training of
Patchout Audio Transformers [6.002503434201551]
We study the use of audio transformers trained on large-scale datasets to learn general-purpose representations.
Our results show that representations extracted by audio transformers outperform CNN representations.
arXiv Detail & Related papers (2022-11-25T08:39:12Z) - SLICER: Learning universal audio representations using low-resource
self-supervised pre-training [53.06337011259031]
We present a new Self-Supervised Learning approach to pre-train encoders on unlabeled audio data.
Our primary aim is to learn audio representations that can generalize across a large variety of speech and non-speech tasks.
arXiv Detail & Related papers (2022-11-02T23:45:33Z) - Representation Learning for the Automatic Indexing of Sound Effects
Libraries [79.68916470119743]
We show that a task-specific but dataset-independent representation can successfully address data issues such as class imbalance, inconsistent class labels, and insufficient dataset size.
Detailed experimental results show the impact of metric learning approaches and different cross-dataset training methods on representational effectiveness.
arXiv Detail & Related papers (2022-08-18T23:46:13Z) - BYOL-S: Learning Self-supervised Speech Representations by Bootstrapping [19.071463356974387]
This work extends existing methods based on self-supervised learning by bootstrapping, proposes various encoder architectures, and explores the effects of using different pre-training datasets.
We present a novel training framework to come up with a hybrid audio representation, which combines handcrafted and data-driven learned audio features.
All the proposed representations were evaluated within the HEAR NeurIPS 2021 challenge for auditory scene classification and timestamp detection tasks.
arXiv Detail & Related papers (2022-06-24T02:26:40Z) - Representative Subset Selection for Efficient Fine-Tuning in
Self-Supervised Speech Recognition [6.450618373898492]
We consider the task of identifying an optimal subset of data for efficient fine-tuning in self-supervised speech models for ASR.
We present the COWERAGE algorithm for representative subset selection in self-supervised ASR.
arXiv Detail & Related papers (2022-03-18T10:12:24Z) - Self-supervised Graphs for Audio Representation Learning with Limited
Labeled Data [24.608764078208953]
Subgraphs are constructed by sampling the entire pool of available training data to exploit the relationship between labelled and unlabeled audio samples.
We evaluate our model on three benchmark audio databases, and two tasks: acoustic event detection and speech emotion recognition.
Our model is compact (240k parameters), and can produce generalized audio representations that are robust to different types of signal noise.
arXiv Detail & Related papers (2022-01-31T21:32:22Z) - Negative Data Augmentation [127.28042046152954]
We show that negative data augmentation samples provide information on the support of the data distribution.
We introduce a new GAN training objective where we use NDA as an additional source of synthetic data for the discriminator.
Empirically, models trained with our method achieve improved conditional/unconditional image generation along with improved anomaly detection capabilities.
arXiv Detail & Related papers (2021-02-09T20:28:35Z) - Laplacian Denoising Autoencoder [114.21219514831343]
We propose to learn data representations with a novel type of denoising autoencoder.
The noisy input data is generated by corrupting latent clean data in the gradient domain.
Experiments on several visual benchmarks demonstrate that better representations can be learned with the proposed approach.
arXiv Detail & Related papers (2020-03-30T16:52:39Z) - Guided Generative Adversarial Neural Network for Representation Learning
and High Fidelity Audio Generation using Fewer Labelled Audio Data [31.00018800567942]
Recent improvements in Generative Adversarial Neural Networks (GANs) have shown their ability to generate higher quality samples.
Most of the representation learning methods based on GANs learn representations ignoring their post-use scenario.
We propose a novel GAN framework: Guided Generative Neural Network (GGAN), which guides a GAN to focus on learning desired representations.
arXiv Detail & Related papers (2020-03-05T11:01:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.