Psychoacoustic Calibration of Loss Functions for Efficient End-to-End
Neural Audio Coding
- URL: http://arxiv.org/abs/2101.00054v1
- Date: Thu, 31 Dec 2020 19:46:46 GMT
- Title: Psychoacoustic Calibration of Loss Functions for Efficient End-to-End
Neural Audio Coding
- Authors: Kai Zhen, Mi Suk Lee, Jongmo Sung, Seungkwon Beack, Minje Kim
- Abstract summary: We present a psychoacoustic calibration scheme to re-define the loss functions of neural audio coding systems.
With the proposed method, a lightweight neural, with only 0.9 million parameters, performs near-transparent audio coding comparable with the commercial MPEG-1 Audio Layer III at 112 kbps.
- Score: 30.307627653506756
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conventional audio coding technologies commonly leverage human perception of
sound, or psychoacoustics, to reduce the bitrate while preserving the
perceptual quality of the decoded audio signals. For neural audio codecs,
however, the objective nature of the loss function usually leads to suboptimal
sound quality as well as high run-time complexity due to the large model size.
In this work, we present a psychoacoustic calibration scheme to re-define the
loss functions of neural audio coding systems so that it can decode signals
more perceptually similar to the reference, yet with a much lower model
complexity. The proposed loss function incorporates the global masking
threshold, allowing the reconstruction error that corresponds to inaudible
artifacts. Experimental results show that the proposed model outperforms the
baseline neural codec twice as large and consuming 23.4% more bits per second.
With the proposed method, a lightweight neural codec, with only 0.9 million
parameters, performs near-transparent audio coding comparable with the
commercial MPEG-1 Audio Layer III codec at 112 kbps.
Related papers
- Learning Source Disentanglement in Neural Audio Codec [20.335701584949526]
We introduce the Source-Disentangled Neural Audio Codec (SD-Codec), a novel approach that combines audio coding and source separation.
By jointly learning audio resynthesis and separation, SD-Codec explicitly assigns audio signals from different domains to distinct codebooks, sets of discrete representations.
Experimental results indicate that SD-Codec not only maintains competitive resynthesis quality but also, supported by the separation results, demonstrates successful disentanglement of different sources in the latent space.
arXiv Detail & Related papers (2024-09-17T14:21:02Z) - A Cryogenic Memristive Neural Decoder for Fault-tolerant Quantum Error Correction [0.0]
We design and analyze a neural decoder based on an in-memory crossbar (IMC) architecture.
We develop hardware-aware re-training methods to mitigate the fidelity loss.
This work provides a pathway to scalable, fast, and low-power cryogenic IMC hardware for integrated fault-tolerant QEC.
arXiv Detail & Related papers (2023-07-18T17:46:33Z) - The END: An Equivariant Neural Decoder for Quantum Error Correction [73.4384623973809]
We introduce a data efficient neural decoder that exploits the symmetries of the problem.
We propose a novel equivariant architecture that achieves state of the art accuracy compared to previous neural decoders.
arXiv Detail & Related papers (2023-04-14T19:46:39Z) - High Fidelity Neural Audio Compression [92.4812002532009]
We introduce a state-of-the-art real-time, high-fidelity, audio leveraging neural networks.
It consists in a streaming encoder-decoder architecture with quantized latent space trained in an end-to-end fashion.
We simplify and speed-up the training by using a single multiscale spectrogram adversary.
arXiv Detail & Related papers (2022-10-24T17:52:02Z) - Latent-Domain Predictive Neural Speech Coding [22.65761249591267]
This paper introduces latent-domain predictive coding into the VQ-VAE framework.
We propose the TF-Codec for low-latency neural speech coding in an end-to-end manner.
Subjective results on multilingual speech datasets show that, with low latency, the proposed TF-Codec at 1 kbps achieves significantly better quality than at 9 kbps.
arXiv Detail & Related papers (2022-07-18T03:18:08Z) - FastLTS: Non-Autoregressive End-to-End Unconstrained Lip-to-Speech
Synthesis [77.06890315052563]
We propose FastLTS, a non-autoregressive end-to-end model which can directly synthesize high-quality speech audios from unconstrained talking videos with low latency.
Experiments show that our model achieves $19.76times$ speedup for audio generation compared with the current autoregressive model on input sequences of 3 seconds.
arXiv Detail & Related papers (2022-07-08T10:10:39Z) - End-to-End Binaural Speech Synthesis [71.1869877389535]
We present an end-to-end speech synthesis system that combines a low-bitrate audio system with a powerful decoder.
We demonstrate the capability of the adversarial loss in capturing environment effects needed to create an authentic auditory scene.
arXiv Detail & Related papers (2022-07-08T05:18:36Z) - Improved decoding of circuit noise and fragile boundaries of tailored
surface codes [61.411482146110984]
We introduce decoders that are both fast and accurate, and can be used with a wide class of quantum error correction codes.
Our decoders, named belief-matching and belief-find, exploit all noise information and thereby unlock higher accuracy demonstrations of QEC.
We find that the decoders led to a much higher threshold and lower qubit overhead in the tailored surface code with respect to the standard, square surface code.
arXiv Detail & Related papers (2022-03-09T18:48:54Z) - SoundStream: An End-to-End Neural Audio Codec [78.94923131038682]
We present SoundStream, a novel neural audio system that can efficiently compress speech, music and general audio.
SoundStream relies on a fully convolutional encoder/decoder network and a residual vector quantizer, which are trained jointly end-to-end.
We are able to perform joint compression and enhancement either at the encoder or at the decoder side with no additional latency.
arXiv Detail & Related papers (2021-07-07T15:45:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.