Related papers: Psychoacoustic Calibration of Loss Functions for Efficient End-to-End Neural Audio Coding

Psychoacoustic Calibration of Loss Functions for Efficient End-to-End Neural Audio Coding

URL: http://arxiv.org/abs/2101.00054v1
Date: Thu, 31 Dec 2020 19:46:46 GMT
Title: Psychoacoustic Calibration of Loss Functions for Efficient End-to-End Neural Audio Coding
Authors: Kai Zhen, Mi Suk Lee, Jongmo Sung, Seungkwon Beack, Minje Kim
Abstract summary: We present a psychoacoustic calibration scheme to re-define the loss functions of neural audio coding systems. With the proposed method, a lightweight neural, with only 0.9 million parameters, performs near-transparent audio coding comparable with the commercial MPEG-1 Audio Layer III at 112 kbps.
Score: 30.307627653506756
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Conventional audio coding technologies commonly leverage human perception of sound, or psychoacoustics, to reduce the bitrate while preserving the perceptual quality of the decoded audio signals. For neural audio codecs, however, the objective nature of the loss function usually leads to suboptimal sound quality as well as high run-time complexity due to the large model size. In this work, we present a psychoacoustic calibration scheme to re-define the loss functions of neural audio coding systems so that it can decode signals more perceptually similar to the reference, yet with a much lower model complexity. The proposed loss function incorporates the global masking threshold, allowing the reconstruction error that corresponds to inaudible artifacts. Experimental results show that the proposed model outperforms the baseline neural codec twice as large and consuming 23.4% more bits per second. With the proposed method, a lightweight neural codec, with only 0.9 million parameters, performs near-transparent audio coding comparable with the commercial MPEG-1 Audio Layer III codec at 112 kbps.

Related papers

Reducing Storage of Pretrained Neural Networks by Rate-Constrained Quantization and Entropy Coding [56.066799081747845]
The ever-growing size of neural networks poses serious challenges on resource-constrained devices.<n>We propose a novel post-training compression framework that combines rate-aware quantization with entropy coding.<n>Our method allows for very fast decoding and is compatible with arbitrary quantization grids.
arXiv Detail & Related papers (2025-05-24T15:52:49Z)
High-Fidelity Music Vocoder using Neural Audio Codecs [18.95453617434051]
DisCoder is a neural vocoder that reconstructs high-fidelity 44.1 kHz audio from mel spectrograms. DisCoder achieves state-of-the-art performance in music synthesis on several objective metrics and in a MUSHRA listening study. Our approach also shows competitive performance in speech synthesis, highlighting its potential as a universal vocoder.
arXiv Detail & Related papers (2025-02-18T11:25:46Z)
Efficient Evaluation of Quantization-Effects in Neural Codecs [4.897318643396687]
Training neural codecs requires techniques to allow a non-zero gradient across the quantizer. This paper proposes an efficient evaluation framework for neural codecs using simulated data. We validate our findings against an internal neural audio gradient and against the state-of-the-art descript-audio-codec.
arXiv Detail & Related papers (2025-02-07T09:11:19Z)
Learning Source Disentanglement in Neural Audio Codec [20.335701584949526]
We introduce the Source-Disentangled Neural Audio Codec (SD-Codec), a novel approach that combines audio coding and source separation. By jointly learning audio resynthesis and separation, SD-Codec explicitly assigns audio signals from different domains to distinct codebooks, sets of discrete representations. Experimental results indicate that SD-Codec not only maintains competitive resynthesis quality but also, supported by the separation results, demonstrates successful disentanglement of different sources in the latent space.
arXiv Detail & Related papers (2024-09-17T14:21:02Z)
A Cryogenic Memristive Neural Decoder for Fault-tolerant Quantum Error Correction [0.0]
We design and analyze a neural decoder based on an in-memory crossbar (IMC) architecture. We develop hardware-aware re-training methods to mitigate the fidelity loss. This work provides a pathway to scalable, fast, and low-power cryogenic IMC hardware for integrated fault-tolerant QEC.
arXiv Detail & Related papers (2023-07-18T17:46:33Z)
The END: An Equivariant Neural Decoder for Quantum Error Correction [73.4384623973809]
We introduce a data efficient neural decoder that exploits the symmetries of the problem. We propose a novel equivariant architecture that achieves state of the art accuracy compared to previous neural decoders.
arXiv Detail & Related papers (2023-04-14T19:46:39Z)
High Fidelity Neural Audio Compression [92.4812002532009]
We introduce a state-of-the-art real-time, high-fidelity, audio leveraging neural networks. It consists in a streaming encoder-decoder architecture with quantized latent space trained in an end-to-end fashion. We simplify and speed-up the training by using a single multiscale spectrogram adversary.
arXiv Detail & Related papers (2022-10-24T17:52:02Z)
Latent-Domain Predictive Neural Speech Coding [22.65761249591267]
This paper introduces latent-domain predictive coding into the VQ-VAE framework. We propose the TF-Codec for low-latency neural speech coding in an end-to-end manner. Subjective results on multilingual speech datasets show that, with low latency, the proposed TF-Codec at 1 kbps achieves significantly better quality than at 9 kbps.
arXiv Detail & Related papers (2022-07-18T03:18:08Z)
FastLTS: Non-Autoregressive End-to-End Unconstrained Lip-to-Speech Synthesis [77.06890315052563]
We propose FastLTS, a non-autoregressive end-to-end model which can directly synthesize high-quality speech audios from unconstrained talking videos with low latency. Experiments show that our model achieves $19.76times$ speedup for audio generation compared with the current autoregressive model on input sequences of 3 seconds.
arXiv Detail & Related papers (2022-07-08T10:10:39Z)
End-to-End Binaural Speech Synthesis [71.1869877389535]
We present an end-to-end speech synthesis system that combines a low-bitrate audio system with a powerful decoder. We demonstrate the capability of the adversarial loss in capturing environment effects needed to create an authentic auditory scene.
arXiv Detail & Related papers (2022-07-08T05:18:36Z)
Improved decoding of circuit noise and fragile boundaries of tailored surface codes [61.411482146110984]
We introduce decoders that are both fast and accurate, and can be used with a wide class of quantum error correction codes. Our decoders, named belief-matching and belief-find, exploit all noise information and thereby unlock higher accuracy demonstrations of QEC. We find that the decoders led to a much higher threshold and lower qubit overhead in the tailored surface code with respect to the standard, square surface code.
arXiv Detail & Related papers (2022-03-09T18:48:54Z)
SoundStream: An End-to-End Neural Audio Codec [78.94923131038682]
We present SoundStream, a novel neural audio system that can efficiently compress speech, music and general audio. SoundStream relies on a fully convolutional encoder/decoder network and a residual vector quantizer, which are trained jointly end-to-end. We are able to perform joint compression and enhancement either at the encoder or at the decoder side with no additional latency.
arXiv Detail & Related papers (2021-07-07T15:45:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.