Related papers: Nonlinear Framework for Speech Bandwidth Extension

Nonlinear Framework for Speech Bandwidth Extension

URL: http://arxiv.org/abs/2507.15970v1
Date: Mon, 21 Jul 2025 18:06:29 GMT
Title: Nonlinear Framework for Speech Bandwidth Extension
Authors: Tarikul Islam Tamiti, Nursad Mamun, Anomadarshi Barua,
Abstract summary: NDSI-BWE is a new adversarial Band Width Extension (BWE) framework that leverage four new discriminators inspired by nonlinear dynamical system.<n>By using depth-wise convolution at the core of the convolutional block with in each discriminator, NDSI-BWE attains an eight-times parameter reduction.
Score: 2.8811725782388686
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recovering high-frequency components lost to bandwidth constraints is crucial for applications ranging from telecommunications to high-fidelity audio on limited resources. We introduce NDSI-BWE, a new adversarial Band Width Extension (BWE) framework that leverage four new discriminators inspired by nonlinear dynamical system to capture diverse temporal behaviors: a Multi-Resolution Lyapunov Discriminator (MRLD) for determining sensitivity to initial conditions by capturing deterministic chaos, a Multi-Scale Recurrence Discriminator (MS-RD) for self-similar recurrence dynamics, a Multi-Scale Detrended Fractal Analysis Discriminator (MSDFA) for long range slow variant scale invariant relationship, a Multi-Resolution Poincar\'e Plot Discriminator (MR-PPD) for capturing hidden latent space relationship, a Multi-Period Discriminator (MPD) for cyclical patterns, a Multi-Resolution Amplitude Discriminator (MRAD) and Multi-Resolution Phase Discriminator (MRPD) for capturing intricate amplitude-phase transition statistics. By using depth-wise convolution at the core of the convolutional block with in each discriminators, NDSI-BWE attains an eight-times parameter reduction. These seven discriminators guide a complex-valued ConformerNeXt based genetor with a dual stream Lattice-Net based architecture for simultaneous refinement of magnitude and phase. The genertor leverage the transformer based conformer's global dependency modeling and ConvNeXt block's local temporal modeling capability. Across six objective evaluation metrics and subjective based texts comprises of five human judges, NDSI-BWE establishes a new SoTA in BWE.

Related papers

PRISM: Lightweight Multivariate Time-Series Classification through Symmetric Multi-Resolution Convolutional Layers [0.0]
PRISM (Per-channel Resolution-Informed Symmetric Module) is a convolutional-based feature extractor that applies symmetric finite-impulse-response filters at multiple temporal scales.<n>Across human-activity, sleep-stage and biomedical benchmarks, PRISM matches or outperforms CNN and Transformer baselines.
arXiv Detail & Related papers (2025-08-06T14:50:25Z)
OptiCorNet: Optimizing Sequence-Based Context Correlation for Visual Place Recognition [2.3093110834423616]
This paper presents OptiCorNet, a novel sequence modeling framework.<n>It unifies spatial feature extraction and temporal differencing into a differentiable, end-to-end trainable module.<n>Our approach outperforms state-of-the-art baselines under challenging seasonal and viewpoint variations.
arXiv Detail & Related papers (2025-07-19T04:29:43Z)
Backscatter Device-aided Integrated Sensing and Communication: A Pareto Optimization Framework [59.30060797118097]
Integrated sensing and communication (ISAC) systems potentially encounter significant performance degradation in densely obstructed urban non-line-of-sight scenarios.<n>This paper proposes a backscatter approximation (BD)-assisted ISAC system, which leverages passive BDs naturally distributed in environments of enhancement.
arXiv Detail & Related papers (2025-07-12T17:11:06Z)
FindRec: Stein-Guided Entropic Flow for Multi-Modal Sequential Recommendation [50.438552588818]
We propose textbfFindRec (textbfFlexible unified textbfinformation textbfdisentanglement for multi-modal sequential textbfRecommendation)<n>A Stein kernel-based Integrated Information Coordination Module (IICM) theoretically guarantees distribution consistency between multimodal features and ID streams.<n>A cross-modal expert routing mechanism that adaptively filters and combines multimodal features based on their contextual relevance.
arXiv Detail & Related papers (2025-07-07T04:09:45Z)
Coarse-to-Fine Proposal Refinement Framework for Audio Temporal Forgery Detection and Localization [60.899082019130766]
We introduce a frame-level detection network (FDN) and a proposal refinement network (PRN) for audio temporal forgery detection and localization. FDN aims to mine informative inconsistency cues between real and fake frames to obtain discriminative features that are beneficial for roughly indicating forgery regions. PRN is responsible for predicting confidence scores and regression offsets to refine the coarse-grained proposals derived from the FDN.
arXiv Detail & Related papers (2024-07-23T15:07:52Z)
Multi-Source and Test-Time Domain Adaptation on Multivariate Signals using Spatio-Temporal Monge Alignment [59.75420353684495]
Machine learning applications on signals such as computer vision or biomedical data often face challenges due to the variability that exists across hardware devices or session recordings. In this work, we propose Spatio-Temporal Monge Alignment (STMA) to mitigate these variabilities. We show that STMA leads to significant and consistent performance gains between datasets acquired with very different settings.
arXiv Detail & Related papers (2024-07-19T13:33:38Z)
ASMR: Activation-sharing Multi-resolution Coordinate Networks For Efficient Inference [6.005712471509875]
Coordinate network or implicit neural representation (INR) is a fast-emerging method for encoding natural signals. We propose the Activation-Sharing Multi-Resolution (ASMR) coordinate network that combines multi-resolution coordinate decomposition with hierarchical modulations. We show that ASMR can reduce the MAC of a vanilla SIREN model by up to 500x while achieving an even higher reconstruction quality than its SIREN baseline.
arXiv Detail & Related papers (2024-05-20T22:35:34Z)
Multi-Dimensional Refinement Graph Convolutional Network with Robust Decouple Loss for Fine-Grained Skeleton-Based Action Recognition [19.031036881780107]
We propose a flexible attention block called Channel-Variable Spatial-Temporal Attention (CVSTA) to enhance the discriminative power of spatial-temporal joints. Based on CVSTA, we construct a Multi-Dimensional Refinement Graph Convolutional Network (MDR-GCN), which can improve the discrimination among channel-, joint- and frame-level features. Furthermore, we propose a Robust Decouple Loss (RDL), which significantly boosts the effect of the CVSTA and reduces the impact of noise.
arXiv Detail & Related papers (2023-06-27T09:23:36Z)
Robust Domain Adaptive Object Detection with Unified Multi-Granularity Alignment [59.831917206058435]
Domain adaptive detection aims to improve the generalization of detectors on target domain. Recent approaches achieve domain adaption through feature alignment in different granularities via adversarial learning. We introduce a unified multi-granularity alignment (MGA)-based detection framework for domain-invariant feature learning.
arXiv Detail & Related papers (2023-01-01T08:38:07Z)
MCL-GAN: Generative Adversarial Networks with Multiple Specialized Discriminators [47.19216713803009]
We propose a framework of generative adversarial networks with multiple discriminators. We guide each discriminator to have expertise in a subset of the entire data. Despite the use of multiple discriminators, the backbone networks are shared across the discriminators.
arXiv Detail & Related papers (2021-07-15T11:35:08Z)
Compute and memory efficient universal sound source separation [23.152611264259225]
We provide a family of efficient neural network architectures for general purpose audio source separation. The backbone structure of this convolutional network is the SUccessive DOwnsampling and Resampling of Multi-Resolution Features (SuDoRM-RF) Our experiments show that SuDoRM-RF models perform comparably and even surpass several state-of-the-art benchmarks.
arXiv Detail & Related papers (2021-03-03T19:16:53Z)
Joint Deep Reinforcement Learning and Unfolding: Beam Selection and Precoding for mmWave Multiuser MIMO with Lens Arrays [54.43962058166702]
millimeter wave (mmWave) multiuser multiple-input multiple-output (MU-MIMO) systems with discrete lens arrays have received great attention. In this work, we investigate the joint design of a beam precoding matrix for mmWave MU-MIMO systems with DLA.
arXiv Detail & Related papers (2021-01-05T03:55:04Z)
Sparse Multi-Family Deep Scattering Network [14.932318540666543]
We propose a novel architecture exploiting the interpretability of the Deep Scattering Network (DSN) The SMF-DSN enhances the DSN by increasing the diversity of the scattering coefficients and (ii) improves its robustness with respect to non-stationary noise.
arXiv Detail & Related papers (2020-12-14T16:06:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.