SubSpectral Normalization for Neural Audio Data Processing
- URL: http://arxiv.org/abs/2103.13620v1
- Date: Thu, 25 Mar 2021 05:55:48 GMT
- Title: SubSpectral Normalization for Neural Audio Data Processing
- Authors: Simyung Chang, Hyoungwoo Park, Janghoon Cho, Hyunsin Park, Sungrack
Yun, Kyuwoong Hwang
- Abstract summary: We introduce SubSpectral Normalization (SSN) which splits the input frequency dimension into several groups (sub-bands) and performs a different normalization for each group.
Our method removes the inter-frequency deflection while the network learns a frequency-aware characteristic.
In the experiments with audio data, we observed that SSN can efficiently improve the network's performance.
- Score: 11.97844299450951
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Convolutional Neural Networks are widely used in various machine learning
domains. In image processing, the features can be obtained by applying 2D
convolution to all spatial dimensions of the input. However, in the audio case,
frequency domain input like Mel-Spectrogram has different and unique
characteristics in the frequency dimension. Thus, there is a need for a method
that allows the 2D convolution layer to handle the frequency dimension
differently. In this work, we introduce SubSpectral Normalization (SSN), which
splits the input frequency dimension into several groups (sub-bands) and
performs a different normalization for each group. SSN also includes an affine
transformation that can be applied to each group. Our method removes the
inter-frequency deflection while the network learns a frequency-aware
characteristic. In the experiments with audio data, we observed that SSN can
efficiently improve the network's performance.
Related papers
- FINER: Flexible spectral-bias tuning in Implicit NEural Representation
by Variable-periodic Activation Functions [40.80112550091512]
Implicit Neural Representation is causing a revolution in the field of signal processing.
Current INR techniques suffer from a restricted capability to tune their supported frequency set.
We propose variable-periodic activation functions, for which we propose FINER.
We demonstrate the capabilities of FINER in the contexts of 2D image fitting, 3D signed distance field representation, and 5D neural fields radiance optimization.
arXiv Detail & Related papers (2023-12-05T02:23:41Z) - Locality-Aware Generalizable Implicit Neural Representation [54.93702310461174]
Generalizable implicit neural representation (INR) enables a single continuous function to represent multiple data instances.
We propose a novel framework for generalizable INR that combines a transformer encoder with a locality-aware INR decoder.
Our framework significantly outperforms previous generalizable INRs and validates the usefulness of the locality-aware latents for downstream tasks.
arXiv Detail & Related papers (2023-10-09T11:26:58Z) - Transform Once: Efficient Operator Learning in Frequency Domain [69.74509540521397]
We study deep neural networks designed to harness the structure in frequency domain for efficient learning of long-range correlations in space or time.
This work introduces a blueprint for frequency domain learning through a single transform: transform once (T1)
arXiv Detail & Related papers (2022-11-26T01:56:05Z) - Neural Implicit Dictionary via Mixture-of-Expert Training [111.08941206369508]
We present a generic INR framework that achieves both data and training efficiency by learning a Neural Implicit Dictionary (NID)
Our NID assembles a group of coordinate-based Impworks which are tuned to span the desired function space.
Our experiments show that, NID can improve reconstruction of 2D images or 3D scenes by 2 orders of magnitude faster with up to 98% less input data.
arXiv Detail & Related papers (2022-07-08T05:07:19Z) - Domain Generalization with Relaxed Instance Frequency-wise Normalization
for Multi-device Acoustic Scene Classification [18.186932959605247]
Domain-relevant information in an audio feature is dominant in frequency statistics rather than channel statistics.
We introduce Relaxed Instance Frequency-wise Normalization (RFN): a plug-and-play, explicit normalization module along the frequency axis.
RFN can eliminate instance-specific domain discrepancy in an audio feature while relaxing undesirable loss of useful discriminative information.
arXiv Detail & Related papers (2022-06-24T23:45:50Z) - Adaptive Frequency Learning in Two-branch Face Forgery Detection [66.91715092251258]
We propose Adaptively learn Frequency information in the two-branch Detection framework, dubbed AFD.
We liberate our network from the fixed frequency transforms, and achieve better performance with our data- and task-dependent transform layers.
arXiv Detail & Related papers (2022-03-27T14:25:52Z) - Deep Frequency Filtering for Domain Generalization [55.66498461438285]
Deep Neural Networks (DNNs) have preferences for some frequency components in the learning process.
We propose Deep Frequency Filtering (DFF) for learning domain-generalizable features.
We show that applying our proposed DFF on a plain baseline outperforms the state-of-the-art methods on different domain generalization tasks.
arXiv Detail & Related papers (2022-03-23T05:19:06Z) - Dense Pruning of Pointwise Convolutions in the Frequency Domain [10.58456555092086]
We propose a technique which wraps each pointwise layer in a discrete cosine transform (DCT) which is truncated to selectively prune coefficients above a given threshold.
Unlike weight pruning techniques which rely on sparse operators, our contiguous frequency band pruning results in fully dense computation.
We apply our technique to MobileNetV2 and in the process reduce computation time by 22% and incur 1% accuracy degradation.
arXiv Detail & Related papers (2021-09-16T04:02:45Z) - Multi-stream Convolutional Neural Network with Frequency Selection for
Robust Speaker Verification [2.3437178262034095]
We propose a novel framework of multi-stream Convolutional Neural Network (CNN) for speaker verification tasks.
The proposed framework accommodates diverse temporal embeddings generated from multiple streams to enhance the robustness of acoustic modeling.
We conduct extensive experiments on VoxCeleb dataset, and the experimental results demonstrate that multi-stream CNN significantly outperforms single-stream baseline.
arXiv Detail & Related papers (2020-12-21T07:23:40Z) - Volumetric Transformer Networks [88.85542905676712]
We introduce a learnable module, the volumetric transformer network (VTN)
VTN predicts channel-wise warping fields so as to reconfigure intermediate CNN features spatially and channel-wisely.
Our experiments show that VTN consistently boosts the features' representation power and consequently the networks' accuracy on fine-grained image recognition and instance-level image retrieval.
arXiv Detail & Related papers (2020-07-18T14:00:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.