Neural-Enhanced Dynamic Range Compression Inversion: A Hybrid Approach for Restoring Audio Dynamics
- URL: http://arxiv.org/abs/2411.04337v2
- Date: Tue, 09 Sep 2025 19:29:08 GMT
- Title: Neural-Enhanced Dynamic Range Compression Inversion: A Hybrid Approach for Restoring Audio Dynamics
- Authors: Haoran Sun, Dominique Fourer, Hichem Maaref,
- Abstract summary: Dynamic Range Compression (DRC) is a widely used audio effect that adjusts signal dynamics for applications in music production, broadcasting, and speech processing.<n>Existing DRC inversion methods either overlook key parameters or rely on precise parameter values, which can be challenging to estimate accurately.<n>We introduce a hybrid approach that combines model-based DRC inversion with neural networks to achieve robust DRC parameter estimation and audio restoration simultaneously.
- Score: 18.219015975713003
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Dynamic Range Compression (DRC) is a widely used audio effect that adjusts signal dynamics for applications in music production, broadcasting, and speech processing. Inverting DRC is of broad importance for restoring the original dynamics, enabling remixing, and enhancing the overall audio quality. Existing DRC inversion methods either overlook key parameters or rely on precise parameter values, which can be challenging to estimate accurately. To address this limitation, we introduce a hybrid approach that combines model-based DRC inversion with neural networks to achieve robust DRC parameter estimation and audio restoration simultaneously. Our method uses tailored neural network architectures (classification and regression), which are then integrated into a model-based inversion framework to reconstruct the original signal. Experimental evaluations on various music and speech datasets confirm the effectiveness and robustness of our approach, outperforming several state-of-the-art techniques.
Related papers
- Interlaced dynamic XCT reconstruction with spatio-temporal implicit neural representations [0.0]
We investigate the use of of-Implicit Neural Representations for dynamic X-ray computed tomography (XCT) reconstruction under interlaced acquisition schemes.<n>The proposed approach combines ADMM-based optimization with INCODE, a conditioning framework incorporating prior knowledge, to enable efficient convergence.<n>Across all settings, our model achieves strong performance robustness and outperforms Time-Inter Model-Based Iter Reconstruction (TIMBIR), a state-of-the-art model-based iterative method.
arXiv Detail & Related papers (2025-10-09T01:33:58Z) - Denoising and Reconstruction of Nonlinear Dynamics using Truncated Reservoir Computing [0.0]
This paper presents a novel Reservoir Computing (RC) method for noise filtering and reconstructing nonlinear dynamics.
The performance of the RC in terms of noise intensity, noise frequency content, and drastic shifts in dynamical parameters are studied.
It is shown that the denoising performance improves via truncating redundant nodes and edges of the computing reservoir.
arXiv Detail & Related papers (2025-04-17T21:47:13Z) - ReCoM: Realistic Co-Speech Motion Generation with Recurrent Embedded Transformer [58.49950218437718]
We present ReCoM, an efficient framework for generating high-fidelity and generalizable human body motions synchronized with speech.<n>The core innovation lies in the Recurrent Embedded Transformer (RET), which integrates Dynamic Embedding Regularization (DER) into a Vision Transformer (ViT) core architecture.<n>To enhance model robustness, we incorporate the proposed DER strategy, which equips the model with dual capabilities of noise resistance and cross-domain generalization.
arXiv Detail & Related papers (2025-03-27T16:39:40Z) - Dynamic-Aware Spatio-temporal Representation Learning for Dynamic MRI Reconstruction [7.704793488616996]
We propose Dynamic-Aware INR (DA-INR), an INR-based model for dynamic MRI reconstruction.
It captures the spatial and temporal continuity of dynamic MRI data in the image domain and explicitly incorporates the temporal redundancy of the data into the model structure.
As a result, DA-INR outperforms other models in reconstruction quality even at extreme undersampling ratios.
arXiv Detail & Related papers (2025-01-15T12:11:33Z) - Releasing the Parameter Latency of Neural Representation for High-Efficiency Video Compression [18.769136361963472]
implicit neural representation (INR) technique models entire videos as basic units, automatically capturing intra-frame and inter-frame correlations.
In this paper, we show that our method significantly enhances the rate-distortion performance of INR video compression.
arXiv Detail & Related papers (2024-10-02T15:19:31Z) - Modeling Time-Variant Responses of Optical Compressors with Selective State Space Models [0.0]
This paper presents a method for modeling optical dynamic range compressors using deep neural networks with Selective State Space models.
It features a refined technique integrating Feature-wise Linear Modulation and Gated Linear Units to adjust the network dynamically.
The proposed architecture is well-suited for low-latency and real-time applications, crucial in live audio processing.
arXiv Detail & Related papers (2024-08-22T17:03:08Z) - On Neural Architectures for Deep Learning-based Source Separation of
Co-Channel OFDM Signals [104.11663769306566]
We study the single-channel source separation problem involving frequency-division multiplexing (OFDM) signals.
We propose critical domain-informed modifications to the network parameterization, based on insights from OFDM structures.
arXiv Detail & Related papers (2023-03-11T16:29:13Z) - Modality-Agnostic Variational Compression of Implicit Neural
Representations [96.35492043867104]
We introduce a modality-agnostic neural compression algorithm based on a functional view of data and parameterised as an Implicit Neural Representation (INR)
Bridging the gap between latent coding and sparsity, we obtain compact latent representations non-linearly mapped to a soft gating mechanism.
After obtaining a dataset of such latent representations, we directly optimise the rate/distortion trade-off in a modality-agnostic space using neural compression.
arXiv Detail & Related papers (2023-01-23T15:22:42Z) - Synthetic Wave-Geometric Impulse Responses for Improved Speech
Dereverberation [69.1351513309953]
We show that accurately simulating the low-frequency components of Room Impulse Responses (RIRs) is important to achieving good dereverberation.
We demonstrate that speech dereverberation models trained on hybrid synthetic RIRs outperform models trained on RIRs generated by prior geometric ray tracing methods.
arXiv Detail & Related papers (2022-12-10T20:15:23Z) - Conditional variational autoencoder to improve neural audio synthesis
for polyphonic music sound [4.002298833349517]
realtime audio variational autoencoder (RAVE) method was developed for high-quality audio waveform synthesis.
We propose an enhanced RAVE model with a conditional variational autoencoder structure and an additional fully-connected layer.
The proposed model exhibits a more significant performance and stability improvement than the conventional RAVE model.
arXiv Detail & Related papers (2022-11-16T07:11:56Z) - Nonparallel High-Quality Audio Super Resolution with Domain Adaptation
and Resampling CycleGANs [9.593925140084846]
We propose a high-quality audio super-resolution method that can utilize unpaired data based on two connected cycle consistent generative adversarial networks (CycleGAN)
Our method decomposes the super-resolution method into domain adaptation and resampling processes to handle acoustic mismatch in the unpaired low- and high-resolution signals.
Experimental results verify that the proposed method significantly outperforms conventional methods when paired data are not available.
arXiv Detail & Related papers (2022-10-28T04:32:59Z) - Multi-stage image denoising with the wavelet transform [125.2251438120701]
Deep convolutional neural networks (CNNs) are used for image denoising via automatically mining accurate structure information.
We propose a multi-stage image denoising CNN with the wavelet transform (MWDCNN) via three stages, i.e., a dynamic convolutional block (DCB), two cascaded wavelet transform and enhancement blocks (WEBs) and residual block (RB)
arXiv Detail & Related papers (2022-09-26T03:28:23Z) - ReconFormer: Accelerated MRI Reconstruction Using Recurrent Transformer [60.27951773998535]
We propose a recurrent transformer model, namely textbfReconFormer, for MRI reconstruction.
It can iteratively reconstruct high fertility magnetic resonance images from highly under-sampled k-space data.
We show that it achieves significant improvements over the state-of-the-art methods with better parameter efficiency.
arXiv Detail & Related papers (2022-01-23T21:58:19Z) - Active Restoration of Lost Audio Signals Using Machine Learning and
Latent Information [0.7252027234425334]
This paper proposes the combination of steganography, halftoning (dithering), and state-of-the-art shallow and deep learning methods.
We show improvement in the inpainting performance in terms of signal-to-noise ratio (SNR), the objective difference grade (ODG) and Hansen's audio quality metric.
arXiv Detail & Related papers (2021-11-21T20:11:33Z) - Robust lEarned Shrinkage-Thresholding (REST): Robust unrolling for
sparse recover [87.28082715343896]
We consider deep neural networks for solving inverse problems that are robust to forward model mis-specifications.
We design a new robust deep neural network architecture by applying algorithm unfolding techniques to a robust version of the underlying recovery problem.
The proposed REST network is shown to outperform state-of-the-art model-based and data-driven algorithms in both compressive sensing and radar imaging problems.
arXiv Detail & Related papers (2021-10-20T06:15:45Z) - Neural Model Reprogramming with Similarity Based Mapping for
Low-Resource Spoken Command Recognition [71.96870151495536]
We propose a novel adversarial reprogramming (AR) approach for low-resource spoken command recognition (SCR)
The AR procedure aims to modify the acoustic signals (from the target domain) to repurpose a pretrained SCR model.
We evaluate the proposed AR-SCR system on three low-resource SCR datasets, including Arabic, Lithuanian, and dysarthric Mandarin speech.
arXiv Detail & Related papers (2021-10-08T05:07:35Z) - A Study on Speech Enhancement Based on Diffusion Probabilistic Model [63.38586161802788]
We propose a diffusion probabilistic model-based speech enhancement model (DiffuSE) model that aims to recover clean speech signals from noisy signals.
The experimental results show that DiffuSE yields performance that is comparable to related audio generative models on the standardized Voice Bank corpus task.
arXiv Detail & Related papers (2021-07-25T19:23:18Z) - Compute and memory efficient universal sound source separation [23.152611264259225]
We provide a family of efficient neural network architectures for general purpose audio source separation.
The backbone structure of this convolutional network is the SUccessive DOwnsampling and Resampling of Multi-Resolution Features (SuDoRM-RF)
Our experiments show that SuDoRM-RF models perform comparably and even surpass several state-of-the-art benchmarks.
arXiv Detail & Related papers (2021-03-03T19:16:53Z) - Deep Networks for Direction-of-Arrival Estimation in Low SNR [89.45026632977456]
We introduce a Convolutional Neural Network (CNN) that is trained from mutli-channel data of the true array manifold matrix.
We train a CNN in the low-SNR regime to predict DoAs across all SNRs.
Our robust solution can be applied in several fields, ranging from wireless array sensors to acoustic microphones or sonars.
arXiv Detail & Related papers (2020-11-17T12:52:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.