The Equalizer: Introducing Shape-Gain Decomposition in Neural Audio Codecs
- URL: http://arxiv.org/abs/2602.15491v1
- Date: Tue, 17 Feb 2026 10:59:33 GMT
- Title: The Equalizer: Introducing Shape-Gain Decomposition in Neural Audio Codecs
- Authors: Samir Sadok, Laurent Girin, Xavier Alameda-Pineda,
- Abstract summary: We propose to introduce shape-gain decomposition, widely used in classical speech/audio coding, into the NAC framework.<n>The proposed Equalizer methodology is to decompose the input signal -- before the NAC encoder -- into gain and normalized shape vector on a short-term basis.<n>Our experiments conducted on speech signals show that this general methodology, easily applicable to any NAC, enables a substantial gain in-distortion performance.
- Score: 20.468614667204093
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Neural audio codecs (NACs) typically encode the short-term energy (gain) and normalized structure (shape) of speech/audio signals jointly within the same latent space. As a result, they are poorly robust to a global variation of the input signal level in the sense that such variation has strong influence on the embedding vectors at the output of the encoder and their quantization. This methodology is inherently inefficient, leading to codebook redundancy and suboptimal bitrate-distortion performance. To address these limitations, we propose to introduce shape-gain decomposition, widely used in classical speech/audio coding, into the NAC framework. The principle of the proposed Equalizer methodology is to decompose the input signal -- before the NAC encoder -- into gain and normalized shape vector on a short-term basis. The shape vector is processed by the NAC, while the gain is quantized with scalar quantization and transmitted separately. The output (decoded) signal is reconstructed from the normalized output of the NAC and the quantized gain. Our experiments conducted on speech signals show that this general methodology, easily applicable to any NAC, enables a substantial gain in bitrate-distortion performance, as well as a massive reduction in complexity.
Related papers
- Generalization Bounds for Transformer Channel Decoders [61.55280736553095]
This paper studies the generalization performance of ECCT from a learning-theoretic perspective.<n>To the best of our knowledge, this work provides the first theoretical generalization guarantees for this class of decoders.
arXiv Detail & Related papers (2026-01-11T15:56:37Z) - Adapting Neural Audio Codecs to EEG [27.20793132729464]
We show that pretrained neural audio codecs can serve as effective starting points for EEG compression.<n>We propose DAC-MC, a multi-channel extension with attention-based cross-channel aggregation and channel-specific decoding.<n> Evaluations on the TUH Abnormal and Epilepsy datasets show that the adapted codecs preserve clinically relevant information.
arXiv Detail & Related papers (2025-11-28T12:47:05Z) - Neural Spectral Band Generation for Audio Coding [14.466825532313795]
We propose a deep neural network (DNN)-based generative approach for coding the high-frequency bands.<n>We show that the proposed method achieves a better perceptual quality than HE-AAC-v1 with much less side information.
arXiv Detail & Related papers (2025-06-07T09:35:08Z) - Fast correlated decoding of transversal logical algorithms [67.01652927671279]
Quantum error correction (QEC) is required for large-scale computation, but incurs a significant resource overhead.<n>Recent advances have shown that by jointly decoding logical qubits in algorithms composed of logical gates, the number of syndrome extraction rounds can be reduced.<n>Here, we reform the problem of decoding circuits by directly decoding relevant logical operator products as they propagate through the circuit.
arXiv Detail & Related papers (2025-05-19T18:00:00Z) - A Theoretical Perspective for Speculative Decoding Algorithm [60.79447486066416]
One effective way to accelerate inference is emphSpeculative Decoding, which employs a small model to sample a sequence of draft tokens and a large model to validate.
This paper tackles this gap by conceptualizing the decoding problem via markov chain abstraction and studying the key properties, emphoutput quality and inference acceleration, from a theoretical perspective.
arXiv Detail & Related papers (2024-10-30T01:53:04Z) - Variable Bitrate Residual Vector Quantization for Audio Coding [29.368893236587343]
Recent neural audio compression models have progressively adopted residual vector quantization (RVQ)<n>These models employ a fixed number of codebooks per frame, which can be suboptimal in terms of rate-distortion tradeoffs.<n>We propose variable RVQ (VRVQ) for audio codecs, which allows for more efficient coding by adapting the number of codebooks used per frame.
arXiv Detail & Related papers (2024-10-08T13:18:24Z) - Locality-Aware Generalizable Implicit Neural Representation [54.93702310461174]
Generalizable implicit neural representation (INR) enables a single continuous function to represent multiple data instances.
We propose a novel framework for generalizable INR that combines a transformer encoder with a locality-aware INR decoder.
Our framework significantly outperforms previous generalizable INRs and validates the usefulness of the locality-aware latents for downstream tasks.
arXiv Detail & Related papers (2023-10-09T11:26:58Z) - Error Correction Code Transformer [92.10654749898927]
We propose to extend for the first time the Transformer architecture to the soft decoding of linear codes at arbitrary block lengths.
We encode each channel's output dimension to high dimension for better representation of the bits information to be processed separately.
The proposed approach demonstrates the extreme power and flexibility of Transformers and outperforms existing state-of-the-art neural decoders by large margins at a fraction of their time complexity.
arXiv Detail & Related papers (2022-03-27T15:25:58Z) - A DNN Based Post-Filter to Enhance the Quality of Coded Speech in MDCT
Domain [16.70806998451696]
We propose a mask-based post-filter operating directly in MDCT domain, inducing no extra delay.
The real-valued mask is applied to the quantized MDCT coefficients and is estimated from a relatively lightweight convolutional encoder-decoder network.
Our solution is tested on the recently standardized low-delay, low-complexity (LC3) at lowest possible coefficients of 16 kbps.
arXiv Detail & Related papers (2022-01-28T11:08:02Z) - On Sparsifying Encoder Outputs in Sequence-to-Sequence Models [90.58793284654692]
We take Transformer as the testbed and introduce a layer of gates in-between the encoder and the decoder.
The gates are regularized using the expected value of the sparsity-inducing L0penalty.
We investigate the effects of this sparsification on two machine translation and two summarization tasks.
arXiv Detail & Related papers (2020-04-24T16:57:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.