Related papers: Scalable and Efficient Neural Speech Coding

Scalable and Efficient Neural Speech Coding

URL: http://arxiv.org/abs/2103.14776v1
Date: Sat, 27 Mar 2021 00:10:16 GMT
Title: Scalable and Efficient Neural Speech Coding
Authors: Kai Zhen, Jongmo Sung, Mi Suk Lee, Seungkwon Beak, Minje Kim
Abstract summary: This work presents a scalable and efficient neural waveform (NWC) for speech compression. The proposed CNN autoencoder also defines quantization and coding as a trainable module. Compared to the other autoregressive decoder-based neural speech, our decoder has significantly smaller architecture.
Score: 24.959825692325445
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This work presents a scalable and efficient neural waveform codec (NWC) for speech compression. We formulate the speech coding problem as an autoencoding task, where a convolutional neural network (CNN) performs encoding and decoding as its feedforward routine. The proposed CNN autoencoder also defines quantization and entropy coding as a trainable module, so the coding artifacts and bitrate control are handled during the optimization process. We achieve efficiency by introducing compact model architectures to our fully convolutional network model, such as gated residual networks and depthwise separable convolution. Furthermore, the proposed models are with a scalable architecture, cross-module residual learning (CMRL), to cover a wide range of bitrates. To this end, we employ the residual coding concept to concatenate multiple NWC autoencoding modules, where an NWC module performs residual coding to restore any reconstruction loss that its preceding modules have created. CMRL can scale down to cover lower bitrates as well, for which it employs linear predictive coding (LPC) module as its first autoencoder. We redefine LPC's quantization as a trainable module to enhance the bit allocation tradeoff between LPC and its following NWC modules. Compared to the other autoregressive decoder-based neural speech coders, our decoder has significantly smaller architecture, e.g., with only 0.12 million parameters, more than 100 times smaller than a WaveNet decoder. Compared to the LPCNet-based speech codec, which leverages the speech production model to reduce the network complexity in low bitrates, ours can scale up to higher bitrates to achieve transparent performance. Our lightweight neural speech coding model achieves comparable subjective scores against AMR-WB at the low bitrate range and provides transparent coding quality at 32 kbps.

Related papers

HH-Codec: High Compression High-fidelity Discrete Neural Codec for Spoken Language Modeling [6.313337261965531]
We introduce HH-Codec, a neural codecs that achieves extreme compression at 24 tokens per second for 24 kHz audio.<n>Our approach involves a carefully designed Vector Quantization space for Spoken Language Modeling, optimizing compression efficiency while minimizing information loss.<n> HH-Codec achieves state-of-the-art performance in speech reconstruction with an ultra-low bandwidth of 0.3 kbps.
arXiv Detail & Related papers (2025-07-25T02:44:30Z)
Reducing Storage of Pretrained Neural Networks by Rate-Constrained Quantization and Entropy Coding [56.066799081747845]
The ever-growing size of neural networks poses serious challenges on resource-constrained devices.<n>We propose a novel post-training compression framework that combines rate-aware quantization with entropy coding.<n>Our method allows for very fast decoding and is compatible with arbitrary quantization grids.
arXiv Detail & Related papers (2025-05-24T15:52:49Z)
GNN-based Auto-Encoder for Short Linear Block Codes: A DRL Approach [43.17241175857862]
It integrates deep reinforcement learning (DRL) and graph neural networks (GNN) in code design. An edge-weighted GNN (EW-GNN) decoder is proposed, which operates on the Tanner graph with an iterative message-passing structure. An iterative joint training of the DRL-based code designer and the EW-GNN decoder is performed to optimize the end-end encoding and decoding process.
arXiv Detail & Related papers (2024-12-03T00:25:14Z)
Standard compliant video coding using low complexity, switchable neural wrappers [8.149130379436759]
We propose a new framework featuring standard compatibility, high performance, and low decoding complexity. We employ a set of jointly optimized neural pre- and post-processors, wrapping a standard video, to encode videos at different resolutions. We design a low complexity neural post-processor architecture that can handle different upsampling ratios.
arXiv Detail & Related papers (2024-07-10T06:36:45Z)
Real-Time Target Sound Extraction [13.526450617545537]
We present the first neural network model to achieve real-time and streaming target sound extraction. We propose Waveformer, an encoder-decoder architecture with a stack of dilated causal convolution layers as the encoder, and a transformer decoder layer as the decoder.
arXiv Detail & Related papers (2022-11-04T03:51:23Z)
Graph Neural Networks for Channel Decoding [71.15576353630667]
We showcase competitive decoding performance for various coding schemes, such as low-density parity-check (LDPC) and BCH codes. The idea is to let a neural network (NN) learn a generalized message passing algorithm over a given graph. We benchmark our proposed decoder against state-of-the-art in conventional channel decoding as well as against recent deep learning-based results.
arXiv Detail & Related papers (2022-07-29T15:29:18Z)
Cross-Scale Vector Quantization for Scalable Neural Speech Coding [22.65761249591267]
Bitrate scalability is a desirable feature for audio coding in real-time communications. In this paper, we introduce a cross-scale scalable vector quantization scheme (CSVQ) In this way, a coarse-level signal is reconstructed if only a portion of the bitstream is received, and progressively improves quality as more bits are available.
arXiv Detail & Related papers (2022-07-07T03:23:25Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
Adversarial Neural Networks for Error Correcting Codes [76.70040964453638]
We introduce a general framework to boost the performance and applicability of machine learning (ML) models. We propose to combine ML decoders with a competing discriminator network that tries to distinguish between codewords and noisy words. Our framework is game-theoretic, motivated by generative adversarial networks (GANs)
arXiv Detail & Related papers (2021-12-21T19:14:44Z)
Non-autoregressive End-to-end Speech Translation with Parallel Autoregressive Rescoring [83.32560748324667]
This article describes an efficient end-to-end speech translation (E2E-ST) framework based on non-autoregressive (NAR) models. We propose a unified NAR E2E-ST framework called Orthros, which has an NAR decoder and an auxiliary shallow AR decoder on top of the shared encoder.
arXiv Detail & Related papers (2021-09-09T16:50:16Z)
A Streamwise GAN Vocoder for Wideband Speech Coding at Very Low Bit Rate [8.312162364318235]
We present a GAN vocoder which is able to generate wideband speech waveforms from parameters coded at 1.6 kbit/s. The proposed model is a modified version of the StyleMelGAN vocoder that can run in frame-by-frame manner.
arXiv Detail & Related papers (2021-08-09T14:03:07Z)
Dynamic Neural Representational Decoders for High-Resolution Semantic Segmentation [98.05643473345474]
We propose a novel decoder, termed dynamic neural representational decoder (NRD) As each location on the encoder's output corresponds to a local patch of the semantic labels, in this work, we represent these local patches of labels with compact neural networks. This neural representation enables our decoder to leverage the smoothness prior in the semantic label space, and thus makes our decoder more efficient.
arXiv Detail & Related papers (2021-07-30T04:50:56Z)
Quantized Neural Networks via {-1, +1} Encoding Decomposition and Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks. We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.