VRVQ: Variable Bitrate Residual Vector Quantization for Audio Compression
- URL: http://arxiv.org/abs/2410.06016v2
- Date: Sat, 12 Oct 2024 16:17:28 GMT
- Title: VRVQ: Variable Bitrate Residual Vector Quantization for Audio Compression
- Authors: Yunkee Chae, Woosung Choi, Yuhta Takida, Junghyun Koo, Yukara Ikemiya, Zhi Zhong, Kin Wai Cheuk, Marco A. Martínez-Ramírez, Kyogu Lee, Wei-Hsiang Liao, Yuki Mitsufuji,
- Abstract summary: Recent neural audio compression models have progressively adopted residual vector quantization (RVQ)
These models employ a fixed number of codebooks per frame, which can be suboptimal in terms of rate-distortion tradeoffs.
We propose variable RVQ (VRVQ) for audio codecs, which allows for more efficient coding by adapting the number of codebooks used per frame.
- Score: 29.368893236587343
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent state-of-the-art neural audio compression models have progressively adopted residual vector quantization (RVQ). Despite this success, these models employ a fixed number of codebooks per frame, which can be suboptimal in terms of rate-distortion tradeoff, particularly in scenarios with simple input audio, such as silence. To address this limitation, we propose variable bitrate RVQ (VRVQ) for audio codecs, which allows for more efficient coding by adapting the number of codebooks used per frame. Furthermore, we propose a gradient estimation method for the non-differentiable masking operation that transforms from the importance map to the binary importance mask, improving model training via a straight-through estimator. We demonstrate that the proposed training framework achieves superior results compared to the baseline method and shows further improvement when applied to the current state-of-the-art codec.
Related papers
- $ε$-VAE: Denoising as Visual Decoding [61.29255979767292]
In generative modeling, tokenization simplifies complex data into compact, structured representations, creating a more efficient, learnable space.
Current visual tokenization methods rely on a traditional autoencoder framework, where the encoder compresses data into latent representations, and the decoder reconstructs the original input.
We propose denoising as decoding, shifting from single-step reconstruction to iterative refinement. Specifically, we replace the decoder with a diffusion process that iteratively refines noise to recover the original image, guided by the latents provided by the encoder.
We evaluate our approach by assessing both reconstruction (rFID) and generation quality (
arXiv Detail & Related papers (2024-10-05T08:27:53Z) - Differentiable bit-rate estimation for neural-based video codec
enhancement [2.592974861902384]
Neural networks (NN) can improve standard video compression by pre- and post-processing the encoded video.
For optimal NN training, the standard proxy needs to be replaced with a proxy that can provide derivatives of estimated bit-rate and distortion.
This paper presents a new approach for bit-rate estimation that is similar to the type employed in training end-to-end neural codecs.
arXiv Detail & Related papers (2023-01-24T01:36:07Z) - High Fidelity Neural Audio Compression [92.4812002532009]
We introduce a state-of-the-art real-time, high-fidelity, audio leveraging neural networks.
It consists in a streaming encoder-decoder architecture with quantized latent space trained in an end-to-end fashion.
We simplify and speed-up the training by using a single multiscale spectrogram adversary.
arXiv Detail & Related papers (2022-10-24T17:52:02Z) - Flexible Neural Image Compression via Code Editing [8.499248314440557]
Neural image compression (NIC) has outperformed traditional image codecs in ratedistortion (R-D) performance.
It usually requires a dedicated encoder-decoder pair for each point on R-D curve, which greatly hinders its practical deployment.
We propose Code Editing, a highly flexible coding method for NIC based on semi-amortized inference and adaptive quantization.
arXiv Detail & Related papers (2022-09-19T09:41:43Z) - Learning Representations for CSI Adaptive Quantization and Feedback [51.14360605938647]
We propose an efficient method for adaptive quantization and feedback in frequency division duplexing systems.
Existing works mainly focus on the implementation of autoencoder (AE) neural networks for CSI compression.
We recommend two different methods: one based on a post training quantization and the second one in which the codebook is found during the training of the AE.
arXiv Detail & Related papers (2022-07-13T08:52:13Z) - Cross-Scale Vector Quantization for Scalable Neural Speech Coding [22.65761249591267]
Bitrate scalability is a desirable feature for audio coding in real-time communications.
In this paper, we introduce a cross-scale scalable vector quantization scheme (CSVQ)
In this way, a coarse-level signal is reconstructed if only a portion of the bitstream is received, and progressively improves quality as more bits are available.
arXiv Detail & Related papers (2022-07-07T03:23:25Z) - Neural JPEG: End-to-End Image Compression Leveraging a Standard JPEG
Encoder-Decoder [73.48927855855219]
We propose a system that learns to improve the encoding performance by enhancing its internal neural representations on both the encoder and decoder ends.
Experiments demonstrate that our approach successfully improves the rate-distortion performance over JPEG across various quality metrics.
arXiv Detail & Related papers (2022-01-27T20:20:03Z) - Implicit Neural Representations for Image Compression [103.78615661013623]
Implicit Neural Representations (INRs) have gained attention as a novel and effective representation for various data types.
We propose the first comprehensive compression pipeline based on INRs including quantization, quantization-aware retraining and entropy coding.
We find that our approach to source compression with INRs vastly outperforms similar prior work.
arXiv Detail & Related papers (2021-12-08T13:02:53Z) - Progressive Neural Image Compression with Nested Quantization and Latent
Ordering [16.871212593949487]
We present PLONQ, a progressive neural image compression scheme which pushes the boundary of variable compression by allowing scalable coding with a single bitstream.
To the best of our knowledge, PLONQ is first learning-based progressive image coding scheme and it outperforms SPIHT, a well-known wavelet-based progressive image.
arXiv Detail & Related papers (2021-02-04T22:06:13Z) - Learned Multi-Resolution Variable-Rate Image Compression with
Octave-based Residual Blocks [15.308823742699039]
We propose a new variable-rate image compression framework, which employs generalized octave convolutions (GoConv) and generalized octave transposed-convolutions (GoTConv)
To enable a single model to operate with different bit rates and to learn multi-rate image features, a new objective function is introduced.
Experimental results show that the proposed framework trained with variable-rate objective function outperforms the standard codecs such as H.265/HEVC-based BPG and state-of-the-art learning-based variable-rate methods.
arXiv Detail & Related papers (2020-12-31T06:26:56Z) - Simple and Effective VAE Training with Calibrated Decoders [123.08908889310258]
Variational autoencoders (VAEs) provide an effective and simple method for modeling complex distributions.
We study the impact of calibrated decoders, which learn the uncertainty of the decoding distribution.
We propose a simple but novel modification to the commonly used Gaussian decoder, which computes the prediction variance analytically.
arXiv Detail & Related papers (2020-06-23T17:57:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.