Neural Spectral Band Generation for Audio Coding
- URL: http://arxiv.org/abs/2506.06732v2
- Date: Mon, 28 Jul 2025 04:36:04 GMT
- Title: Neural Spectral Band Generation for Audio Coding
- Authors: Woongjib Choi, Byeong Hyeon Kim, Hyungseob Lim, Inseon Jang, Hong-Goo Kang,
- Abstract summary: We propose a deep neural network (DNN)-based generative approach for coding the high-frequency bands.<n>We show that the proposed method achieves a better perceptual quality than HE-AAC-v1 with much less side information.
- Score: 14.466825532313795
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Spectral band replication (SBR) enables bit-efficient coding by generating high-frequency bands from the low-frequency ones. However, it only utilizes coarse spectral features upon a subband-wise signal replication, limiting adaptability to diverse acoustic signals. In this paper, we explore the efficacy of a deep neural network (DNN)-based generative approach for coding the high-frequency bands, which we call neural spectral band generation (n-SBG). Specifically, we propose a DNN-based encoder-decoder structure to extract and quantize the side information related to the high-frequency components and generate the components given both the side information and the decoded core-band signals. The whole coding pipeline is optimized with generative adversarial criteria to enable the generation of perceptually plausible sound. From experiments using AAC as the core codec, we show that the proposed method achieves a better perceptual quality than HE-AAC-v1 with much less side information.
Related papers
- FANeRV: Frequency Separation and Augmentation based Neural Representation for Video [32.35716293561769]
We present a Frequency Separation and Augmentation based Neural Representation for video (FANeRV)<n>FANeRV explicitly separates input frames into high and low-frequency components using discrete wavelet transform.<n>A specially designed gated network effectively fuses these frequency components for optimal reconstruction.
arXiv Detail & Related papers (2025-04-09T10:19:35Z) - Unsupervised CP-UNet Framework for Denoising DAS Data with Decay Noise [13.466125373185399]
Distributed acoustic sensor (DAS) technology leverages optical fiber cables to detect acoustic signals.<n>DAS exhibits a lower signal-to-noise ratio (S/N) compared to geophones.<n>This reduced S/N can negatively impact data analyses containing inversion and interpretation.
arXiv Detail & Related papers (2025-02-19T03:09:49Z) - VNet: A GAN-based Multi-Tier Discriminator Network for Speech Synthesis Vocoders [14.222389985736422]
VNet is a GAN-based neural vocoder network that incorporates full-band spectral information.
We demonstrate that the VNet model is capable of generating high-fidelity speech.
arXiv Detail & Related papers (2024-08-13T14:00:02Z) - Tuning the Frequencies: Robust Training for Sinusoidal Neural Networks [1.5124439914522694]
We introduce a theoretical framework that explains the capacity property of sinusoidal networks.<n>We show how its layer compositions produce a large number of new frequencies expressed as integer combinations of the input frequencies.<n>Our method, referred to as TUNER, greatly improves the stability and convergence of sinusoidal INR training, leading to detailed reconstructions.
arXiv Detail & Related papers (2024-07-30T18:24:46Z) - Rethinking Transformer-Based Blind-Spot Network for Self-Supervised Image Denoising [94.09442506816724]
Blind-spot networks (BSN) have been prevalent neural architectures in self-supervised image denoising (SSID)<n>We build a Transformer-based Blind-Spot Network (TBSN) which shows strong local fitting and global perspective abilities.
arXiv Detail & Related papers (2024-04-11T15:39:10Z) - Low-power SNN-based audio source localisation using a Hilbert Transform spike encoding scheme [4.49657690895714]
Sound source localisation is used in many consumer devices, to isolate audio from individual speakers and reject noise.<n>Dense band-pass filters are often needed to obtain narrowband signal components from wideband audio.<n>We demonstrate a novel method for sound source localisation on arbitrary microphone arrays, designed for efficient implementation in ultra-low-power spiking neural networks (SNNs)<n>Our approach achieves state-of-the-art accuracy for SNN methods, comparable with traditional non-SNN super-resolution beamforming.
arXiv Detail & Related papers (2024-02-19T00:21:13Z) - Deep OFDM Channel Estimation: Capturing Frequency Recurrence [10.76835122839777]
We propose a deep-learning-based channel estimation scheme in an OFDM system.
We employ recurrent neural network techniques within a single OFDM slot, thus overcoming the latency and memory constraints.
The proposed SisRafNet delivers superior estimation performance compared to existing deep-learning-based channel estimation techniques.
arXiv Detail & Related papers (2024-01-07T14:13:08Z) - Locality-Aware Generalizable Implicit Neural Representation [54.93702310461174]
Generalizable implicit neural representation (INR) enables a single continuous function to represent multiple data instances.
We propose a novel framework for generalizable INR that combines a transformer encoder with a locality-aware INR decoder.
Our framework significantly outperforms previous generalizable INRs and validates the usefulness of the locality-aware latents for downstream tasks.
arXiv Detail & Related papers (2023-10-09T11:26:58Z) - Distributed Deep Joint Source-Channel Coding with Decoder-Only Side
Information [6.411633100057159]
We consider low-latency image transmission over a noisy wireless channel when correlated side information is present only at the receiver side.
We propose a novel neural network architecture that incorporates the decoder-only side information at multiple stages at the receiver side.
arXiv Detail & Related papers (2023-10-06T15:17:45Z) - Joint Channel Estimation and Feedback with Masked Token Transformers in
Massive MIMO Systems [74.52117784544758]
This paper proposes an encoder-decoder based network that unveils the intrinsic frequency-domain correlation within the CSI matrix.
The entire encoder-decoder network is utilized for channel compression.
Our method outperforms state-of-the-art channel estimation and feedback techniques in joint tasks.
arXiv Detail & Related papers (2023-06-08T06:15:17Z) - NAF: Neural Attenuation Fields for Sparse-View CBCT Reconstruction [79.13750275141139]
This paper proposes a novel and fast self-supervised solution for sparse-view CBCT reconstruction.
The desired attenuation coefficients are represented as a continuous function of 3D spatial coordinates, parameterized by a fully-connected deep neural network.
A learning-based encoder entailing hash coding is adopted to help the network capture high-frequency details.
arXiv Detail & Related papers (2022-09-29T04:06:00Z) - Denoising Diffusion Error Correction Codes [92.10654749898927]
Recently, neural decoders have demonstrated their advantage over classical decoding techniques.
Recent state-of-the-art neural decoders suffer from high complexity and lack the important iterative scheme characteristic of many legacy decoders.
We propose to employ denoising diffusion models for the soft decoding of linear codes at arbitrary block lengths.
arXiv Detail & Related papers (2022-09-16T11:00:50Z) - Disentangled Representation Learning for RF Fingerprint Extraction under
Unknown Channel Statistics [77.13542705329328]
We propose a framework of disentangled representation learning(DRL) that first learns to factor the input signals into a device-relevant component and a device-irrelevant component via adversarial learning.
The implicit data augmentation in the proposed framework imposes a regularization on the RFF extractor to avoid the possible overfitting of device-irrelevant channel statistics.
Experiments validate that the proposed approach, referred to as DR-RFF, outperforms conventional methods in terms of generalizability to unknown complicated propagation environments.
arXiv Detail & Related papers (2022-08-04T15:46:48Z) - Deep Learning-Based Synchronization for Uplink NB-IoT [72.86843435313048]
We propose a neural network (NN)-based algorithm for device detection and time of arrival (ToA) estimation for the narrowband physical random-access channel (NPRACH) of narrowband internet of things (NB-IoT)
The introduced NN architecture leverages residual convolutional networks as well as knowledge of the preamble structure of the 5G New Radio (5G NR) specifications.
arXiv Detail & Related papers (2022-05-22T12:16:43Z) - SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with
Adaptive Noise Spectral Shaping [51.698273019061645]
SpecGrad adapts the diffusion noise so that its time-varying spectral envelope becomes close to the conditioning log-mel spectrogram.
It is processed in the time-frequency domain to keep the computational cost almost the same as the conventional DDPM-based neural vocoders.
arXiv Detail & Related papers (2022-03-31T02:08:27Z) - NeuralDPS: Neural Deterministic Plus Stochastic Model with Multiband
Excitation for Noise-Controllable Waveform Generation [67.96138567288197]
We propose a novel neural vocoder named NeuralDPS which can retain high speech quality and acquire high synthesis efficiency and noise controllability.
It generates waveforms at least 280 times faster than the WaveNet vocoder.
It is also 28% faster than WaveGAN's synthesis efficiency on a single core.
arXiv Detail & Related papers (2022-03-05T08:15:29Z) - Parallel frequency function-deep neural network for efficient complex
broadband signal approximation [1.536989504296526]
A neural network is essentially a high-dimensional complex mapping model by adjusting network weights for feature fitting.
The spectral bias in network training leads to unbearable training epochs for fitting the high-frequency components in broadband signals.
A parallel frequency function-deep neural network (PFF-DNN) is proposed to suppress computational overhead while ensuring fitting accuracy.
arXiv Detail & Related papers (2021-06-19T01:39:13Z) - Two-step Machine Learning Approach for Channel Estimation with Mixed
Resolution RF Chains [19.0581196881206]
We propose an efficient uplink channel estimator by applying machine learning (ML) algorithms.
In a first step a conditional generative adversarial network (cGAN) predicts the radio channels from a limited set of full resolution RF chains to the rest of the low resolution RF chain antenna elements.
A long-short term memory (LSTM) neural network extracts further phase information from the low resolution RF chain antenna elements.
arXiv Detail & Related papers (2021-01-24T12:33:54Z) - Deep Networks for Direction-of-Arrival Estimation in Low SNR [89.45026632977456]
We introduce a Convolutional Neural Network (CNN) that is trained from mutli-channel data of the true array manifold matrix.
We train a CNN in the low-SNR regime to predict DoAs across all SNRs.
Our robust solution can be applied in several fields, ranging from wireless array sensors to acoustic microphones or sonars.
arXiv Detail & Related papers (2020-11-17T12:52:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.