Blind Restoration of Real-World Audio by 1D Operational GANs
- URL: http://arxiv.org/abs/2212.14618v1
- Date: Fri, 30 Dec 2022 10:11:57 GMT
- Title: Blind Restoration of Real-World Audio by 1D Operational GANs
- Authors: Turker Ince, Serkan Kiranyaz, Ozer Can Devecioglu, Muhammad Salman
Khan, Muhammad Chowdhury, and Moncef Gabbouj
- Abstract summary: We propose a novel approach for blind restoration of real-world audio signals by Operational Generative Adversarial Networks (Op-GANs)
The proposed approach has been evaluated extensively over the benchmark TIMIT-RAR (speech) and GTZAN-RAR (non-speech) datasets.
Average SDR improvements of over 7.2 dB and 4.9 dB are achieved, respectively, which are substantial when compared with the baseline methods.
- Score: 18.462912387382346
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Objective: Despite numerous studies proposed for audio restoration in the
literature, most of them focus on an isolated restoration problem such as
denoising or dereverberation, ignoring other artifacts. Moreover, assuming a
noisy or reverberant environment with limited number of fixed
signal-to-distortion ratio (SDR) levels is a common practice. However,
real-world audio is often corrupted by a blend of artifacts such as
reverberation, sensor noise, and background audio mixture with varying types,
severities, and duration. In this study, we propose a novel approach for blind
restoration of real-world audio signals by Operational Generative Adversarial
Networks (Op-GANs) with temporal and spectral objective metrics to enhance the
quality of restored audio signal regardless of the type and severity of each
artifact corrupting it. Methods: 1D Operational-GANs are used with generative
neuron model optimized for blind restoration of any corrupted audio signal.
Results: The proposed approach has been evaluated extensively over the
benchmark TIMIT-RAR (speech) and GTZAN-RAR (non-speech) datasets corrupted with
a random blend of artifacts each with a random severity to mimic real-world
audio signals. Average SDR improvements of over 7.2 dB and 4.9 dB are achieved,
respectively, which are substantial when compared with the baseline methods.
Significance: This is a pioneer study in blind audio restoration with the
unique capability of direct (time-domain) restoration of real-world audio
whilst achieving an unprecedented level of performance for a wide SDR range and
artifact types. Conclusion: 1D Op-GANs can achieve robust and computationally
effective real-world audio restoration with significantly improved performance.
The source codes and the generated real-world audio datasets are shared
publicly with the research community in a dedicated GitHub repository1.
Related papers
- BRSR-OpGAN: Blind Radar Signal Restoration using Operational Generative Adversarial Network [15.913517836391357]
Real-world radar signals are often corrupted by a blend of artifacts, including but not limited to unwanted echo, sensor noise, intentional jamming, and interference.
This study introduces Blind Radar Signal Restoration using an Operational Generative Adversarial Network (BRSR-OpGAN)
This approach is designed to improve the quality of radar signals, regardless of the diversity and intensity of the corruption.
arXiv Detail & Related papers (2024-07-18T23:55:48Z) - Reverse the auditory processing pathway: Coarse-to-fine audio reconstruction from fMRI [20.432212333539628]
We introduce a novel coarse-to-fine audio reconstruction method based on functional Magnetic Resonance Imaging (fMRI) data.
We validate our method on three public fMRI datasets-Brain2Sound, Brain2Music, and Brain2Speech.
By employing semantic prompts during decoding, we enhance the quality of reconstructed audio when semantic features are suboptimal.
arXiv Detail & Related papers (2024-05-29T03:16:14Z) - Audio-visual End-to-end Multi-channel Speech Separation, Dereverberation
and Recognition [52.11964238935099]
An audio-visual multi-channel speech separation, dereverberation and recognition approach is proposed in this paper.
Video input is consistently demonstrated in mask-based MVDR speech separation, DNN-WPE or spectral mapping (SpecM) based speech dereverberation front-end.
Experiments were conducted on the mixture overlapped and reverberant speech data constructed using simulation or replay of the Oxford LRS2 dataset.
arXiv Detail & Related papers (2023-07-06T10:50:46Z) - Blind Audio Bandwidth Extension: A Diffusion-Based Zero-Shot Approach [4.030910640265943]
In cases where the lowpass degradation is unknown, such as in restoring historical audio recordings, this becomes a blind problem.
This paper introduces a novel method called BABE that addresses the blind problem in a zero-shot setting.
BABE exhibits robust generalization capabilities when enhancing real historical recordings.
arXiv Detail & Related papers (2023-06-02T10:47:15Z) - The role of noise in denoising models for anomaly detection in medical
images [62.0532151156057]
Pathological brain lesions exhibit diverse appearance in brain images.
Unsupervised anomaly detection approaches have been proposed using only normal data for training.
We show that optimization of the spatial resolution and magnitude of the noise improves the performance of different model training regimes.
arXiv Detail & Related papers (2023-01-19T21:39:38Z) - End-to-End Binaural Speech Synthesis [71.1869877389535]
We present an end-to-end speech synthesis system that combines a low-bitrate audio system with a powerful decoder.
We demonstrate the capability of the adversarial loss in capturing environment effects needed to create an authentic auditory scene.
arXiv Detail & Related papers (2022-07-08T05:18:36Z) - Active Restoration of Lost Audio Signals Using Machine Learning and
Latent Information [0.7252027234425334]
This paper proposes the combination of steganography, halftoning (dithering), and state-of-the-art shallow and deep learning methods.
We show improvement in the inpainting performance in terms of signal-to-noise ratio (SNR), the objective difference grade (ODG) and Hansen's audio quality metric.
arXiv Detail & Related papers (2021-11-21T20:11:33Z) - RefineGAN: Universally Generating Waveform Better than Ground Truth with
Highly Accurate Pitch and Intensity Responses [15.599745604729842]
We propose RefineGAN, a high-fidelity neural vocoder with faster-than-real-time generation capability.
We employ a pitch-guided refine architecture with a multi-scale spectrogram-based loss function to help stabilize the training process.
We show that the fidelity is even improved during the waveform reconstruction by eliminating defects produced by the speaker.
arXiv Detail & Related papers (2021-11-01T14:12:54Z) - Improving Noise Robustness of Contrastive Speech Representation Learning
with Speech Reconstruction [109.44933866397123]
Noise robustness is essential for deploying automatic speech recognition systems in real-world environments.
We employ a noise-robust representation learned by a refined self-supervised framework for noisy speech recognition.
We achieve comparable performance to the best supervised approach reported with only 16% of labeled data.
arXiv Detail & Related papers (2021-10-28T20:39:02Z) - Bridging the Gap Between Clean Data Training and Real-World Inference
for Spoken Language Understanding [76.89426311082927]
Existing models are trained on clean data, which causes a textitgap between clean data training and real-world inference.
We propose a method from the perspective of domain adaptation, by which both high- and low-quality samples are embedding into similar vector space.
Experiments on the widely-used dataset, Snips, and large scale in-house dataset (10 million training examples) demonstrate that this method not only outperforms the baseline models on real-world (noisy) corpus but also enhances the robustness, that is, it produces high-quality results under a noisy environment.
arXiv Detail & Related papers (2021-04-13T17:54:33Z) - Hierarchical Timbre-Painting and Articulation Generation [92.59388372914265]
We present a fast and high-fidelity method for music generation, based on specified f0 and loudness.
The synthesized audio mimics the timbre and articulation of a target instrument.
arXiv Detail & Related papers (2020-08-30T05:27:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.