Active Restoration of Lost Audio Signals Using Machine Learning and
Latent Information
- URL: http://arxiv.org/abs/2111.10891v4
- Date: Thu, 18 Jan 2024 22:43:56 GMT
- Title: Active Restoration of Lost Audio Signals Using Machine Learning and
Latent Information
- Authors: Zohra Adila Cheddad, Abbas Cheddad
- Abstract summary: This paper proposes the combination of steganography, halftoning (dithering), and state-of-the-art shallow and deep learning methods.
We show improvement in the inpainting performance in terms of signal-to-noise ratio (SNR), the objective difference grade (ODG) and Hansen's audio quality metric.
- Score: 0.7252027234425334
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Digital audio signal reconstruction of a lost or corrupt segment using deep
learning algorithms has been explored intensively in recent years.
Nevertheless, prior traditional methods with linear interpolation, phase coding
and tone insertion techniques are still in vogue. However, we found no research
work on reconstructing audio signals with the fusion of dithering,
steganography, and machine learning regressors. Therefore, this paper proposes
the combination of steganography, halftoning (dithering), and state-of-the-art
shallow and deep learning methods. The results (including comparing the SPAIN,
Autoregressive, deep learning-based, graph-based, and other methods) are
evaluated with three different metrics. The observations from the results show
that the proposed solution is effective and can enhance the reconstruction of
audio signals performed by the side information (e.g., Latent representation)
steganography provides. Moreover, this paper proposes a novel framework for
reconstruction from heavily compressed embedded audio data using halftoning
(i.e., dithering) and machine learning, which we termed the HCR (halftone-based
compression and reconstruction). This work may trigger interest in optimising
this approach and/or transferring it to different domains (i.e., image
reconstruction). Compared to existing methods, we show improvement in the
inpainting performance in terms of signal-to-noise ratio (SNR), the objective
difference grade (ODG) and Hansen's audio quality metric. In particular, our
proposed framework outperformed the learning-based methods (D2WGAN and SG) and
the traditional statistical algorithms (e.g., SPAIN, TDC, WCP).
Related papers
- Model and Deep learning based Dynamic Range Compression Inversion [12.002024727237837]
Inverting DRC can help to restore the original dynamics to produce new mixes and/or to improve the overall quality of the audio signal.
We propose a model-based approach with neural networks for DRC inversion.
Our results show the effectiveness and robustness of the proposed method in comparison to several state-of-the-art methods.
arXiv Detail & Related papers (2024-11-07T00:33:07Z) - Learning Phone Recognition from Unpaired Audio and Phone Sequences Based
on Generative Adversarial Network [58.82343017711883]
This paper investigates how to learn directly from unpaired phone sequences and speech utterances.
GAN training is adopted in the first stage to find the mapping relationship between unpaired speech and phone sequence.
In the second stage, another HMM model is introduced to train from the generator's output, which boosts the performance.
arXiv Detail & Related papers (2022-07-29T09:29:28Z) - WNet: A data-driven dual-domain denoising model for sparse-view computed
tomography with a trainable reconstruction layer [3.832032989515628]
We propose WNet, a data-driven dual-domain denoising model which contains a trainable reconstruction layer for sparse-view artifact denoising.
We train and test our network on two clinically relevant datasets and we compare the obtained results with three different types of sparse-view CT denoising and reconstruction algorithms.
arXiv Detail & Related papers (2022-07-01T13:17:01Z) - A Review of Sound Source Localization with Deep Learning Methods [71.18444724397486]
This article is a review on deep learning methods for single and multiple sound source localization.
We provide an exhaustive topography of the neural-based localization literature in this context.
Tables summarizing the literature review are provided at the end of the review for a quick search of methods with a given set of target characteristics.
arXiv Detail & Related papers (2021-09-08T07:25:39Z) - A SAR speckle filter based on Residual Convolutional Neural Networks [68.8204255655161]
This work aims to present a novel method for filtering the speckle noise from Sentinel-1 data by applying Deep Learning (DL) algorithms, based on Convolutional Neural Networks (CNNs)
The obtained results, if compared with the state of the art, show a clear improvement in terms of Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM)
arXiv Detail & Related papers (2021-04-19T14:43:07Z) - TEACHTEXT: CrossModal Generalized Distillation for Text-Video Retrieval [103.85002875155551]
We propose a novel generalized distillation method, TeachText, for exploiting large-scale language pretraining.
We extend our method to video side modalities and show that we can effectively reduce the number of used modalities at test time.
Our approach advances the state of the art on several video retrieval benchmarks by a significant margin and adds no computational overhead at test time.
arXiv Detail & Related papers (2021-04-16T17:55:28Z) - Orthogonal Features Based EEG Signals Denoising Using Fractional and
Compressed One-Dimensional CNN AutoEncoder [3.8580784887142774]
This paper presents a fractional one-dimensional convolutional neural network (CNN) autoencoder for denoising the Electroencephalogram (EEG) signals.
EEG signals often get contaminated with noise during the recording process, mostly due to muscle artifacts (MA)
arXiv Detail & Related papers (2021-04-16T13:58:05Z) - Time-domain Speech Enhancement with Generative Adversarial Learning [53.74228907273269]
This paper proposes a new framework called Time-domain Speech Enhancement Generative Adversarial Network (TSEGAN)
TSEGAN is an extension of the generative adversarial network (GAN) in time-domain with metric evaluation to mitigate the scaling problem.
In addition, we provide a new method based on objective function mapping for the theoretical analysis of the performance of Metric GAN.
arXiv Detail & Related papers (2021-03-30T08:09:49Z) - SoundCLR: Contrastive Learning of Representations For Improved
Environmental Sound Classification [0.6767885381740952]
SoundCLR is a supervised contrastive learning method for effective environment sound classification with state-of-the-art performance.
Due to the comparatively small sizes of the available environmental sound datasets, we propose and exploit a transfer learning and strong data augmentation pipeline.
Our experiments show that our masking based augmentation technique on the log-mel spectrograms can significantly improve the recognition performance.
arXiv Detail & Related papers (2021-03-02T18:42:45Z) - RAR-U-Net: a Residual Encoder to Attention Decoder by Residual
Connections Framework for Spine Segmentation under Noisy Labels [9.81466618834274]
We propose a new and efficient method for medical image segmentation under noisy labels.
The method operates under a deep learning paradigm, incorporating four novel contributions.
Experimental results are illustrated on a publicly available benchmark database of spine CTs.
arXiv Detail & Related papers (2020-09-27T15:32:50Z) - ADRN: Attention-based Deep Residual Network for Hyperspectral Image
Denoising [52.01041506447195]
We propose an attention-based deep residual network to learn a mapping from noisy HSI to the clean one.
Experimental results demonstrate that our proposed ADRN scheme outperforms the state-of-the-art methods both in quantitative and visual evaluations.
arXiv Detail & Related papers (2020-03-04T08:36:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.