Combolutional Neural Networks
- URL: http://arxiv.org/abs/2507.21202v1
- Date: Mon, 28 Jul 2025 13:30:51 GMT
- Title: Combolutional Neural Networks
- Authors: Cameron Churchwell, Minje Kim, Paris Smaragdis,
- Abstract summary: We propose a combolutional layer a learned-delay IIR comb filter and fused envelope detector, which extracts harmonic features in the time domain.<n>We find that the combolutional layer is an effective replacement for convolutional layers in audio tasks where precise harmonic analysis is important.
- Score: 21.93943668751019
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Selecting appropriate inductive biases is an essential step in the design of machine learning models, especially when working with audio, where even short clips may contain millions of samples. To this end, we propose the combolutional layer: a learned-delay IIR comb filter and fused envelope detector, which extracts harmonic features in the time domain. We demonstrate the efficacy of the combolutional layer on three information retrieval tasks, evaluate its computational cost relative to other audio frontends, and provide efficient implementations for training. We find that the combolutional layer is an effective replacement for convolutional layers in audio tasks where precise harmonic analysis is important, e.g., piano transcription, speaker classification, and key detection. Additionally, the combolutional layer has several other key benefits over existing frontends, namely: low parameter count, efficient CPU inference, strictly real-valued computations, and improved interpretability.
Related papers
- Automatic Input Feature Relevance via Spectral Neural Networks [0.9236074230806581]
In machine learning practice it is often useful to identify relevant input features, so as to obtain compact dataset for more efficient numerical handling.<n>We propose a novel method to estimate the relative importance of the input components for a Deep Neural Network.
arXiv Detail & Related papers (2024-06-03T10:39:12Z) - Phase Synchrony Component Self-Organization in Brain Computer Interface [3.2116198597240846]
Phase synchrony information plays a crucial role in analyzing functional brain connectivity and identifying brain activities.
We propose the concept of phase synchrony component self-organization, which enables the adaptive learning of data-dependent spatial filters.
Based on this concept, the first deep learning end-to-end network is developed, which directly extracts phase synchrony-based features from raw EEG signals.
arXiv Detail & Related papers (2023-09-21T09:42:16Z) - UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation [93.88170217725805]
We propose a 3D medical image segmentation approach, named UNETR++, that offers both high-quality segmentation masks as well as efficiency in terms of parameters, compute cost, and inference speed.
The core of our design is the introduction of a novel efficient paired attention (EPA) block that efficiently learns spatial and channel-wise discriminative features.
Our evaluations on five benchmarks, Synapse, BTCV, ACDC, BRaTs, and Decathlon-Lung, reveal the effectiveness of our contributions in terms of both efficiency and accuracy.
arXiv Detail & Related papers (2022-12-08T18:59:57Z) - Simple Pooling Front-ends For Efficient Audio Classification [56.59107110017436]
We show that eliminating the temporal redundancy in the input audio features could be an effective approach for efficient audio classification.
We propose a family of simple pooling front-ends (SimPFs) which use simple non-parametric pooling operations to reduce the redundant information.
SimPFs can achieve a reduction in more than half of the number of floating point operations for off-the-shelf audio neural networks.
arXiv Detail & Related papers (2022-10-03T14:00:41Z) - SCAI: A Spectral data Classification framework with Adaptive Inference
for the IoT platform [0.0]
We propose a Spectral data Classification framework with Adaptive Inference.
Specifically, to allocate different computations for different samples while better exploiting the collaboration among different devices.
To the best of our knowledge, this paper is the first attempt to conduct optimization by adaptive inference for spectral detection under the IoT platform.
arXiv Detail & Related papers (2022-06-24T09:22:52Z) - Improved Speech Emotion Recognition using Transfer Learning and
Spectrogram Augmentation [56.264157127549446]
Speech emotion recognition (SER) is a challenging task that plays a crucial role in natural human-computer interaction.
One of the main challenges in SER is data scarcity.
We propose a transfer learning strategy combined with spectrogram augmentation.
arXiv Detail & Related papers (2021-08-05T10:39:39Z) - Preliminary study on using vector quantization latent spaces for TTS/VC
systems with consistent performance [55.10864476206503]
We investigate the use of quantized vectors to model the latent linguistic embedding.
By enforcing different policies over the latent spaces in the training, we are able to obtain a latent linguistic embedding.
Our experiments show that the voice cloning system built with vector quantization has only a small degradation in terms of perceptive evaluations.
arXiv Detail & Related papers (2021-06-25T07:51:35Z) - Deep Convolutional and Recurrent Networks for Polyphonic Instrument
Classification from Monophonic Raw Audio Waveforms [30.3491261167433]
Sound Event Detection and Audio Classification tasks are traditionally addressed through time-frequency representations of audio signals such as spectrograms.
Deep neural networks as efficient feature extractors has enabled the direct use of audio signals for classification purposes.
We attempt to recognize musical instruments in polyphonic audio by only feeding their raw waveforms into deep learning models.
arXiv Detail & Related papers (2021-02-13T13:44:46Z) - Fast accuracy estimation of deep learning based multi-class musical
source separation [79.10962538141445]
We propose a method to evaluate the separability of instruments in any dataset without training and tuning a neural network.
Based on the oracle principle with an ideal ratio mask, our approach is an excellent proxy to estimate the separation performances of state-of-the-art deep learning approaches.
arXiv Detail & Related papers (2020-10-19T13:05:08Z) - Ensemble Wrapper Subsampling for Deep Modulation Classification [70.91089216571035]
Subsampling of received wireless signals is important for relaxing hardware requirements as well as the computational cost of signal processing algorithms.
We propose a subsampling technique to facilitate the use of deep learning for automatic modulation classification in wireless communication systems.
arXiv Detail & Related papers (2020-05-10T06:11:13Z) - Using deep learning to understand and mitigate the qubit noise
environment [0.0]
We propose to address the challenge of extracting accurate noise spectra from time-dynamics measurements on qubits.
We demonstrate a neural network based methodology that allows for extraction of the noise spectrum associated with any qubit surrounded by an arbitrary bath.
Our results can be applied to a wide range of qubit platforms and provide a framework for improving qubit performance.
arXiv Detail & Related papers (2020-05-03T17:13:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.