Audio Spectral Enhancement: Leveraging Autoencoders for Low Latency
Reconstruction of Long, Lossy Audio Sequences
- URL: http://arxiv.org/abs/2108.03703v1
- Date: Sun, 8 Aug 2021 18:06:21 GMT
- Title: Audio Spectral Enhancement: Leveraging Autoencoders for Low Latency
Reconstruction of Long, Lossy Audio Sequences
- Authors: Darshan Deshpande and Harshavardhan Abichandani
- Abstract summary: We propose a novel approach for reconstructing higher frequencies from considerably longer sequences of low-quality MP3 audio waves.
Our architecture presents several bottlenecks while preserving the spectral structure of the audio wave via skip-connections.
We show how to leverage differential quantization techniques to reduce the initial model size by more than half while simultaneously reducing inference time.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With active research in audio compression techniques yielding substantial
breakthroughs, spectral reconstruction of low-quality audio waves remains a
less indulged topic. In this paper, we propose a novel approach for
reconstructing higher frequencies from considerably longer sequences of
low-quality MP3 audio waves. Our technique involves inpainting audio
spectrograms with residually stacked autoencoder blocks by manipulating
individual amplitude and phase values in relation to perceptual differences.
Our architecture presents several bottlenecks while preserving the spectral
structure of the audio wave via skip-connections. We also compare several task
metrics and demonstrate our visual guide to loss selection. Moreover, we show
how to leverage differential quantization techniques to reduce the initial
model size by more than half while simultaneously reducing inference time,
which is crucial in real-world applications.
Related papers
- Music2Latent: Consistency Autoencoders for Latent Audio Compression [0.0]
We introduce Music2Latent, an audio autoencoder that overcomes limitations by leveraging consistency models.
Music2Latent encodes samples into a compressed continuous latent space in a single end-to-end training process.
We demonstrate that Music2Latent outperforms existing continuous audio autoencoders in sound quality and reconstruction accuracy.
arXiv Detail & Related papers (2024-08-12T21:25:19Z) - RTFS-Net: Recurrent Time-Frequency Modelling for Efficient Audio-Visual Speech Separation [18.93255531121519]
We present a novel time-frequency domain audio-visual speech separation method.
RTFS-Net applies its algorithms on the complex time-frequency bins yielded by the Short-Time Fourier Transform.
This is the first time-frequency domain audio-visual speech separation method to outperform all contemporary time-domain counterparts.
arXiv Detail & Related papers (2023-09-29T12:38:00Z) - MAST: Multiscale Audio Spectrogram Transformers [53.06337011259031]
We present Multiscale Audio Spectrogram Transformer (MAST) for audio classification, which brings the concept of multiscale feature hierarchies to the Audio Spectrogram Transformer (AST)
In practice, MAST significantly outperforms AST by an average accuracy of 3.4% across 8 speech and non-speech tasks from the LAPE Benchmark.
arXiv Detail & Related papers (2022-11-02T23:34:12Z) - High Fidelity Neural Audio Compression [92.4812002532009]
We introduce a state-of-the-art real-time, high-fidelity, audio leveraging neural networks.
It consists in a streaming encoder-decoder architecture with quantized latent space trained in an end-to-end fashion.
We simplify and speed-up the training by using a single multiscale spectrogram adversary.
arXiv Detail & Related papers (2022-10-24T17:52:02Z) - FastLTS: Non-Autoregressive End-to-End Unconstrained Lip-to-Speech
Synthesis [77.06890315052563]
We propose FastLTS, a non-autoregressive end-to-end model which can directly synthesize high-quality speech audios from unconstrained talking videos with low latency.
Experiments show that our model achieves $19.76times$ speedup for audio generation compared with the current autoregressive model on input sequences of 3 seconds.
arXiv Detail & Related papers (2022-07-08T10:10:39Z) - SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with
Adaptive Noise Spectral Shaping [51.698273019061645]
SpecGrad adapts the diffusion noise so that its time-varying spectral envelope becomes close to the conditioning log-mel spectrogram.
It is processed in the time-frequency domain to keep the computational cost almost the same as the conventional DDPM-based neural vocoders.
arXiv Detail & Related papers (2022-03-31T02:08:27Z) - Deep Convolutional and Recurrent Networks for Polyphonic Instrument
Classification from Monophonic Raw Audio Waveforms [30.3491261167433]
Sound Event Detection and Audio Classification tasks are traditionally addressed through time-frequency representations of audio signals such as spectrograms.
Deep neural networks as efficient feature extractors has enabled the direct use of audio signals for classification purposes.
We attempt to recognize musical instruments in polyphonic audio by only feeding their raw waveforms into deep learning models.
arXiv Detail & Related papers (2021-02-13T13:44:46Z) - Hierarchical Timbre-Painting and Articulation Generation [92.59388372914265]
We present a fast and high-fidelity method for music generation, based on specified f0 and loudness.
The synthesized audio mimics the timbre and articulation of a target instrument.
arXiv Detail & Related papers (2020-08-30T05:27:39Z) - Temporal-Spatial Neural Filter: Direction Informed End-to-End
Multi-channel Target Speech Separation [66.46123655365113]
Target speech separation refers to extracting the target speaker's speech from mixed signals.
Two main challenges are the complex acoustic environment and the real-time processing requirement.
We propose a temporal-spatial neural filter, which directly estimates the target speech waveform from multi-speaker mixture.
arXiv Detail & Related papers (2020-01-02T11:12:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.