High Fidelity Neural Audio Compression
- URL: http://arxiv.org/abs/2210.13438v1
- Date: Mon, 24 Oct 2022 17:52:02 GMT
- Title: High Fidelity Neural Audio Compression
- Authors: Alexandre D\'efossez, Jade Copet, Gabriel Synnaeve, Yossi Adi
- Abstract summary: We introduce a state-of-the-art real-time, high-fidelity, audio leveraging neural networks.
It consists in a streaming encoder-decoder architecture with quantized latent space trained in an end-to-end fashion.
We simplify and speed-up the training by using a single multiscale spectrogram adversary.
- Score: 92.4812002532009
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a state-of-the-art real-time, high-fidelity, audio codec
leveraging neural networks. It consists in a streaming encoder-decoder
architecture with quantized latent space trained in an end-to-end fashion. We
simplify and speed-up the training by using a single multiscale spectrogram
adversary that efficiently reduces artifacts and produce high-quality samples.
We introduce a novel loss balancer mechanism to stabilize training: the weight
of a loss now defines the fraction of the overall gradient it should represent,
thus decoupling the choice of this hyper-parameter from the typical scale of
the loss. Finally, we study how lightweight Transformer models can be used to
further compress the obtained representation by up to 40%, while staying faster
than real time. We provide a detailed description of the key design choices of
the proposed model including: training objective, architectural changes and a
study of various perceptual loss functions. We present an extensive subjective
evaluation (MUSHRA tests) together with an ablation study for a range of
bandwidths and audio domains, including speech, noisy-reverberant speech, and
music. Our approach is superior to the baselines methods across all evaluated
settings, considering both 24 kHz monophonic and 48 kHz stereophonic audio.
Code and models are available at github.com/facebookresearch/encodec.
Related papers
- WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling [65.30937248905958]
A crucial component of language models is the tokenizer, which compresses high-dimensional natural signals into lower-dimensional discrete tokens.
We introduce WavTokenizer, which offers several advantages over previous SOTA acoustic models in the audio domain.
WavTokenizer achieves state-of-the-art reconstruction quality with outstanding UTMOS scores and inherently contains richer semantic information.
arXiv Detail & Related papers (2024-08-29T13:43:36Z) - From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion [84.138804145918]
Deep generative models can generate high-fidelity audio conditioned on various types of representations.
These models are prone to generate audible artifacts when the conditioning is flawed or imperfect.
We propose a high-fidelity multi-band diffusion-based framework that generates any type of audio modality from low-bitrate discrete representations.
arXiv Detail & Related papers (2023-08-02T22:14:29Z) - RAVE: A variational autoencoder for fast and high-quality neural audio
synthesis [2.28438857884398]
We introduce a Realtime Audio Variational autoEncoder (RAVE) allowing both fast and high-quality audio waveform synthesis.
We show that our model is the first able to generate 48kHz audio signals, while simultaneously running 20 times faster than real-time on a standard laptop CPU.
arXiv Detail & Related papers (2021-11-09T09:07:30Z) - Audio Spectral Enhancement: Leveraging Autoencoders for Low Latency
Reconstruction of Long, Lossy Audio Sequences [0.0]
We propose a novel approach for reconstructing higher frequencies from considerably longer sequences of low-quality MP3 audio waves.
Our architecture presents several bottlenecks while preserving the spectral structure of the audio wave via skip-connections.
We show how to leverage differential quantization techniques to reduce the initial model size by more than half while simultaneously reducing inference time.
arXiv Detail & Related papers (2021-08-08T18:06:21Z) - SoundStream: An End-to-End Neural Audio Codec [78.94923131038682]
We present SoundStream, a novel neural audio system that can efficiently compress speech, music and general audio.
SoundStream relies on a fully convolutional encoder/decoder network and a residual vector quantizer, which are trained jointly end-to-end.
We are able to perform joint compression and enhancement either at the encoder or at the decoder side with no additional latency.
arXiv Detail & Related papers (2021-07-07T15:45:42Z) - Psychoacoustic Calibration of Loss Functions for Efficient End-to-End
Neural Audio Coding [30.307627653506756]
We present a psychoacoustic calibration scheme to re-define the loss functions of neural audio coding systems.
With the proposed method, a lightweight neural, with only 0.9 million parameters, performs near-transparent audio coding comparable with the commercial MPEG-1 Audio Layer III at 112 kbps.
arXiv Detail & Related papers (2020-12-31T19:46:46Z) - Ultra-low bitrate video conferencing using deep image animation [7.263312285502382]
We propose a novel deep learning approach for ultra-low video compression for video conferencing applications.
We employ deep neural networks to encode motion information as keypoint displacement and reconstruct the video signal at the decoder side.
arXiv Detail & Related papers (2020-12-01T09:06:34Z) - Hierarchical Timbre-Painting and Articulation Generation [92.59388372914265]
We present a fast and high-fidelity method for music generation, based on specified f0 and loudness.
The synthesized audio mimics the timbre and articulation of a target instrument.
arXiv Detail & Related papers (2020-08-30T05:27:39Z) - Content Adaptive and Error Propagation Aware Deep Video Compression [110.31693187153084]
We propose a content adaptive and error propagation aware video compression system.
Our method employs a joint training strategy by considering the compression performance of multiple consecutive frames instead of a single frame.
Instead of using the hand-crafted coding modes in the traditional compression systems, we design an online encoder updating scheme in our system.
arXiv Detail & Related papers (2020-03-25T09:04:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.