Related papers: SubSpectral Normalization for Neural Audio Data Processing

SubSpectral Normalization for Neural Audio Data Processing

URL: http://arxiv.org/abs/2103.13620v1
Date: Thu, 25 Mar 2021 05:55:48 GMT
Title: SubSpectral Normalization for Neural Audio Data Processing
Authors: Simyung Chang, Hyoungwoo Park, Janghoon Cho, Hyunsin Park, Sungrack Yun, Kyuwoong Hwang
Abstract summary: We introduce SubSpectral Normalization (SSN) which splits the input frequency dimension into several groups (sub-bands) and performs a different normalization for each group. Our method removes the inter-frequency deflection while the network learns a frequency-aware characteristic. In the experiments with audio data, we observed that SSN can efficiently improve the network's performance.
Score: 11.97844299450951
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Convolutional Neural Networks are widely used in various machine learning domains. In image processing, the features can be obtained by applying 2D convolution to all spatial dimensions of the input. However, in the audio case, frequency domain input like Mel-Spectrogram has different and unique characteristics in the frequency dimension. Thus, there is a need for a method that allows the 2D convolution layer to handle the frequency dimension differently. In this work, we introduce SubSpectral Normalization (SSN), which splits the input frequency dimension into several groups (sub-bands) and performs a different normalization for each group. SSN also includes an affine transformation that can be applied to each group. Our method removes the inter-frequency deflection while the network learns a frequency-aware characteristic. In the experiments with audio data, we observed that SSN can efficiently improve the network's performance.

Related papers

Cross-Frequency Implicit Neural Representation with Self-Evolving Parameters [52.574661274784916]
Implicit neural representation (INR) has emerged as a powerful paradigm for visual data representation. We propose a self-evolving cross-frequency INR using the Haar wavelet transform (termed CF-INR), which decouples data into four frequency components and employs INRs in the wavelet space. We evaluate CF-INR on a variety of visual data representation and recovery tasks, including image regression, inpainting, denoising, and cloud removal.
arXiv Detail & Related papers (2025-04-15T07:14:35Z)
FINER: Flexible spectral-bias tuning in Implicit NEural Representation by Variable-periodic Activation Functions [40.80112550091512]
Implicit Neural Representation is causing a revolution in the field of signal processing. Current INR techniques suffer from a restricted capability to tune their supported frequency set. We propose variable-periodic activation functions, for which we propose FINER. We demonstrate the capabilities of FINER in the contexts of 2D image fitting, 3D signed distance field representation, and 5D neural fields radiance optimization.
arXiv Detail & Related papers (2023-12-05T02:23:41Z)
Locality-Aware Generalizable Implicit Neural Representation [54.93702310461174]
Generalizable implicit neural representation (INR) enables a single continuous function to represent multiple data instances. We propose a novel framework for generalizable INR that combines a transformer encoder with a locality-aware INR decoder. Our framework significantly outperforms previous generalizable INRs and validates the usefulness of the locality-aware latents for downstream tasks.
arXiv Detail & Related papers (2023-10-09T11:26:58Z)
Transform Once: Efficient Operator Learning in Frequency Domain [69.74509540521397]
We study deep neural networks designed to harness the structure in frequency domain for efficient learning of long-range correlations in space or time. This work introduces a blueprint for frequency domain learning through a single transform: transform once (T1)
arXiv Detail & Related papers (2022-11-26T01:56:05Z)
Neural Implicit Dictionary via Mixture-of-Expert Training [111.08941206369508]
We present a generic INR framework that achieves both data and training efficiency by learning a Neural Implicit Dictionary (NID) Our NID assembles a group of coordinate-based Impworks which are tuned to span the desired function space. Our experiments show that, NID can improve reconstruction of 2D images or 3D scenes by 2 orders of magnitude faster with up to 98% less input data.
arXiv Detail & Related papers (2022-07-08T05:07:19Z)
Domain Generalization with Relaxed Instance Frequency-wise Normalization for Multi-device Acoustic Scene Classification [18.186932959605247]
Domain-relevant information in an audio feature is dominant in frequency statistics rather than channel statistics. We introduce Relaxed Instance Frequency-wise Normalization (RFN): a plug-and-play, explicit normalization module along the frequency axis. RFN can eliminate instance-specific domain discrepancy in an audio feature while relaxing undesirable loss of useful discriminative information.
arXiv Detail & Related papers (2022-06-24T23:45:50Z)
Adaptive Frequency Learning in Two-branch Face Forgery Detection [66.91715092251258]
We propose Adaptively learn Frequency information in the two-branch Detection framework, dubbed AFD. We liberate our network from the fixed frequency transforms, and achieve better performance with our data- and task-dependent transform layers.
arXiv Detail & Related papers (2022-03-27T14:25:52Z)
Deep Frequency Filtering for Domain Generalization [55.66498461438285]
Deep Neural Networks (DNNs) have preferences for some frequency components in the learning process. We propose Deep Frequency Filtering (DFF) for learning domain-generalizable features. We show that applying our proposed DFF on a plain baseline outperforms the state-of-the-art methods on different domain generalization tasks.
arXiv Detail & Related papers (2022-03-23T05:19:06Z)
Dense Pruning of Pointwise Convolutions in the Frequency Domain [10.58456555092086]
We propose a technique which wraps each pointwise layer in a discrete cosine transform (DCT) which is truncated to selectively prune coefficients above a given threshold. Unlike weight pruning techniques which rely on sparse operators, our contiguous frequency band pruning results in fully dense computation. We apply our technique to MobileNetV2 and in the process reduce computation time by 22% and incur 1% accuracy degradation.
arXiv Detail & Related papers (2021-09-16T04:02:45Z)
Multi-stream Convolutional Neural Network with Frequency Selection for Robust Speaker Verification [2.3437178262034095]
We propose a novel framework of multi-stream Convolutional Neural Network (CNN) for speaker verification tasks. The proposed framework accommodates diverse temporal embeddings generated from multiple streams to enhance the robustness of acoustic modeling. We conduct extensive experiments on VoxCeleb dataset, and the experimental results demonstrate that multi-stream CNN significantly outperforms single-stream baseline.
arXiv Detail & Related papers (2020-12-21T07:23:40Z)
Volumetric Transformer Networks [88.85542905676712]
We introduce a learnable module, the volumetric transformer network (VTN) VTN predicts channel-wise warping fields so as to reconfigure intermediate CNN features spatially and channel-wisely. Our experiments show that VTN consistently boosts the features' representation power and consequently the networks' accuracy on fine-grained image recognition and instance-level image retrieval.
arXiv Detail & Related papers (2020-07-18T14:00:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.