Related papers: FlowMAC: Conditional Flow Matching for Audio Coding at Low Bit Rates

Related papers

Consistency Flow Model Achieves One-step Denoising Error Correction Codes [28.89866643527586]
We introduce the Error Correction Consistency Flow Model (ECCFM) for high-fidelity one-step decoding.<n>ECCFM attains lower bit-error rates (BER) than autoregressive and diffusion-based baselines.<n>It delivers inference speeds up from 30x to 100x faster than denoising diffusion decoders.
arXiv Detail & Related papers (2025-12-01T08:07:51Z)
FocalCodec-Stream: Streaming Low-Bitrate Speech Coding via Causal Distillation [27.32235541083431]
FocalCodec-Stream is a hybrid that compresses speech into a single binary codebook at 0.55 - 0.80 kbps with a theoretical latency of 80 ms.<n> Experiments show that FocalCodec-Stream outperforms existing streamable codecs at comparables.
arXiv Detail & Related papers (2025-09-19T17:57:13Z)
Finite Scalar Quantization Enables Redundant and Transmission-Robust Neural Audio Compression at Low Bit-rates [1.445167946386569]
We show that Finite Scalar Quantization (FSQ) encodes baked-in redundancy which produces an encoding which is robust when transmitted through noisy channels.<n>We demonstrate that FSQ has vastly superior bit-level perturbation by comparing the performance of RVQ and FSQ codecs when simulating the transmission of code sequences through a noisy channel.
arXiv Detail & Related papers (2025-09-11T15:39:59Z)
HH-Codec: High Compression High-fidelity Discrete Neural Codec for Spoken Language Modeling [6.313337261965531]
We introduce HH-Codec, a neural codecs that achieves extreme compression at 24 tokens per second for 24 kHz audio.<n>Our approach involves a carefully designed Vector Quantization space for Spoken Language Modeling, optimizing compression efficiency while minimizing information loss.<n> HH-Codec achieves state-of-the-art performance in speech reconstruction with an ultra-low bandwidth of 0.3 kbps.
arXiv Detail & Related papers (2025-07-25T02:44:30Z)
FlowDec: A flow-based full-band general audio codec with high perceptual quality [90.05968801459524]
FlowDec is a neural full-band audio codecs for general audio sampled at 48 kHz. We generalize from speech to general audio and move from 24 kbit/s to as low as 4 kbit/s.
arXiv Detail & Related papers (2025-03-03T12:49:09Z)
A Quantum Approximate Optimization Algorithm-based Decoder Architecture for NextG Wireless Channel Codes [6.52154420965995]
Forward Error Correction (FEC) provides reliable data flow in wireless networks despite the presence of noise and interference. FEC processing demands significant fraction of a wireless network's resources, due to its computationally-expensive decoding process. We present FDeQ, a QAOA-based FEC Decoder design targeting the popular NextG wireless Low Density Parity Check (LDPC) and Polar codes. FDeQ achieves successful decoding with error performance at par with state-of-the-art classical decoders at low FEC code block lengths.
arXiv Detail & Related papers (2024-08-21T15:53:09Z)
Streaming Audio-Visual Speech Recognition with Alignment Regularization [69.30185151873707]
We propose a streaming AV-ASR system based on a hybrid connectionist temporal classification ( CTC)/attention neural network architecture. The proposed AV-ASR model achieves WERs of 2.0% and 2.6% on the Lip Reading Sentences 3 dataset in an offline and online setup.
arXiv Detail & Related papers (2022-11-03T20:20:47Z)
Denoising Diffusion Error Correction Codes [92.10654749898927]
Recently, neural decoders have demonstrated their advantage over classical decoding techniques. Recent state-of-the-art neural decoders suffer from high complexity and lack the important iterative scheme characteristic of many legacy decoders. We propose to employ denoising diffusion models for the soft decoding of linear codes at arbitrary block lengths.
arXiv Detail & Related papers (2022-09-16T11:00:50Z)
Masked Autoencoders that Listen [79.99280830830854]
This paper studies a simple extension of image-based Masked Autoencoders (MAE) to self-supervised representation learning from audio spectrograms. Following the Transformer encoder-decoder design in MAE, our Audio-MAE first encodes audio spectrogram patches with a high masking ratio, feeding only the non-masked tokens through encoder layers. The decoder then re-orders and decodes the encoded context padded with mask tokens, in order to reconstruct the input spectrogram.
arXiv Detail & Related papers (2022-07-13T17:59:55Z)
Cross-Scale Vector Quantization for Scalable Neural Speech Coding [22.65761249591267]
Bitrate scalability is a desirable feature for audio coding in real-time communications. In this paper, we introduce a cross-scale scalable vector quantization scheme (CSVQ) In this way, a coarse-level signal is reconstructed if only a portion of the bitstream is received, and progressively improves quality as more bits are available.
arXiv Detail & Related papers (2022-07-07T03:23:25Z)
Improved decoding of circuit noise and fragile boundaries of tailored surface codes [61.411482146110984]
We introduce decoders that are both fast and accurate, and can be used with a wide class of quantum error correction codes. Our decoders, named belief-matching and belief-find, exploit all noise information and thereby unlock higher accuracy demonstrations of QEC. We find that the decoders led to a much higher threshold and lower qubit overhead in the tailored surface code with respect to the standard, square surface code.
arXiv Detail & Related papers (2022-03-09T18:48:54Z)
A DNN Based Post-Filter to Enhance the Quality of Coded Speech in MDCT Domain [16.70806998451696]
We propose a mask-based post-filter operating directly in MDCT domain, inducing no extra delay. The real-valued mask is applied to the quantized MDCT coefficients and is estimated from a relatively lightweight convolutional encoder-decoder network. Our solution is tested on the recently standardized low-delay, low-complexity (LC3) at lowest possible coefficients of 16 kbps.
arXiv Detail & Related papers (2022-01-28T11:08:02Z)
Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden Intermediates [59.678108707409606]
We propose Fast-MD, a fast MD model that generates HI by non-autoregressive decoding based on connectionist temporal classification (CTC) outputs followed by an ASR decoder. Fast-MD achieved about 2x and 4x faster decoding speed than that of the na"ive MD model on GPU and CPU with comparable translation quality.
arXiv Detail & Related papers (2021-09-27T05:21:30Z)
A Streamwise GAN Vocoder for Wideband Speech Coding at Very Low Bit Rate [8.312162364318235]
We present a GAN vocoder which is able to generate wideband speech waveforms from parameters coded at 1.6 kbit/s. The proposed model is a modified version of the StyleMelGAN vocoder that can run in frame-by-frame manner.
arXiv Detail & Related papers (2021-08-09T14:03:07Z)
SoundStream: An End-to-End Neural Audio Codec [78.94923131038682]
We present SoundStream, a novel neural audio system that can efficiently compress speech, music and general audio. SoundStream relies on a fully convolutional encoder/decoder network and a residual vector quantizer, which are trained jointly end-to-end. We are able to perform joint compression and enhancement either at the encoder or at the decoder side with no additional latency.
arXiv Detail & Related papers (2021-07-07T15:45:42Z)
Enhancement Of Coded Speech Using a Mask-Based Post-Filter [9.324642081509754]
A data-driven post-filter relying on masking in the time-frequency domain is proposed. A fully connected neural network (FCNN), a convolutional encoder-decoder (CED) network and a long short-term memory (LSTM) network are implemeted to estimate a real-valued mask per time-frequency bin.
arXiv Detail & Related papers (2020-10-12T09:48:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.