ATGNN: Audio Tagging Graph Neural Network
- URL: http://arxiv.org/abs/2311.01526v1
- Date: Thu, 2 Nov 2023 18:19:26 GMT
- Title: ATGNN: Audio Tagging Graph Neural Network
- Authors: Shubhr Singh, Christian J. Steinmetz, Emmanouil Benetos, Huy Phan, Dan
Stowell
- Abstract summary: ATGNN is a graph neural architecture that maps semantic relationships between learnable class embeddings and spectrogram regions.
We evaluate ATGNN on two audio tagging tasks, where it achieves 0.585 mAP on the FSD50K dataset and 0.335 mAP on the AudioSet-balanced dataset.
- Score: 25.78859233831268
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning models such as CNNs and Transformers have achieved impressive
performance for end-to-end audio tagging. Recent works have shown that despite
stacking multiple layers, the receptive field of CNNs remains severely limited.
Transformers on the other hand are able to map global context through
self-attention, but treat the spectrogram as a sequence of patches which is not
flexible enough to capture irregular audio objects. In this work, we treat the
spectrogram in a more flexible way by considering it as graph structure and
process it with a novel graph neural architecture called ATGNN. ATGNN not only
combines the capability of CNNs with the global information sharing ability of
Graph Neural Networks, but also maps semantic relationships between learnable
class embeddings and corresponding spectrogram regions. We evaluate ATGNN on
two audio tagging tasks, where it achieves 0.585 mAP on the FSD50K dataset and
0.335 mAP on the AudioSet-balanced dataset, achieving comparable results to
Transformer based models with significantly lower number of learnable
parameters.
Related papers
- Automatic speech recognition for the Nepali language using CNN, bidirectional LSTM and ResNet [0.0]
This paper presents an end-to-end deep learning model for Automatic Speech Recognition (ASR) that transcribes Nepali speech to text.
The model was trained and tested on the OpenSLR (audio, text) dataset.
The character error rate (CER) of 17.06 percent has been achieved.
arXiv Detail & Related papers (2024-06-25T12:14:01Z) - CNN2GNN: How to Bridge CNN with GNN [59.42117676779735]
We propose a novel CNN2GNN framework to unify CNN and GNN together via distillation.
The performance of distilled boosted'' two-layer GNN on Mini-ImageNet is much higher than CNN containing dozens of layers such as ResNet152.
arXiv Detail & Related papers (2024-04-23T08:19:08Z) - BLIS-Net: Classifying and Analyzing Signals on Graphs [20.345611294709244]
Graph neural networks (GNNs) have emerged as a powerful tool for tasks such as node classification and graph classification.
We introduce the BLIS-Net (Bi-Lipschitz Scattering Net), a novel GNN that builds on the previously introduced geometric scattering transform.
We show that BLIS-Net achieves superior performance on both synthetic and real-world data sets based on traffic flow and fMRI data.
arXiv Detail & Related papers (2023-10-26T17:03:14Z) - T-GAE: Transferable Graph Autoencoder for Network Alignment [79.89704126746204]
T-GAE is a graph autoencoder framework that leverages transferability and stability of GNNs to achieve efficient network alignment without retraining.
Our experiments demonstrate that T-GAE outperforms the state-of-the-art optimization method and the best GNN approach by up to 38.7% and 50.8%, respectively.
arXiv Detail & Related papers (2023-10-05T02:58:29Z) - Graph Ladling: Shockingly Simple Parallel GNN Training without
Intermediate Communication [100.51884192970499]
GNNs are a powerful family of neural networks for learning over graphs.
scaling GNNs either by deepening or widening suffers from prevalent issues of unhealthy gradients, over-smoothening, information squashing.
We propose not to deepen or widen current GNNs, but instead present a data-centric perspective of model soups tailored for GNNs.
arXiv Detail & Related papers (2023-06-18T03:33:46Z) - Efficient Large-scale Audio Tagging via Transformer-to-CNN Knowledge
Distillation [6.617487928813374]
We propose a training procedure for efficient CNNs based on offline Knowledge Distillation (KD) from high-performing yet complex transformers.
We provide models of different complexity levels, scaling from low-complexity models up to a new state-of-the-art performance of.483 mAP on AudioSet.
arXiv Detail & Related papers (2022-11-09T09:58:22Z) - Dynamic Graph Message Passing Networks for Visual Recognition [112.49513303433606]
Modelling long-range dependencies is critical for scene understanding tasks in computer vision.
A fully-connected graph is beneficial for such modelling, but its computational overhead is prohibitive.
We propose a dynamic graph message passing network, that significantly reduces the computational complexity.
arXiv Detail & Related papers (2022-09-20T14:41:37Z) - Receptive Field Regularization Techniques for Audio Classification and
Tagging with Deep Convolutional Neural Networks [7.9495796547433395]
We show that tuning the Receptive Field (RF) of CNNs is crucial to their generalization.
We propose several systematic approaches to control the RF of CNNs and systematically test the resulting architectures.
arXiv Detail & Related papers (2021-05-26T08:36:29Z) - Graph Neural Networks with Adaptive Frequency Response Filter [55.626174910206046]
We develop a graph neural network framework AdaGNN with a well-smooth adaptive frequency response filter.
We empirically validate the effectiveness of the proposed framework on various benchmark datasets.
arXiv Detail & Related papers (2021-04-26T19:31:21Z) - AST: Audio Spectrogram Transformer [21.46018186487818]
We introduce the Audio Spectrogram Transformer (AST), the first convolution-free, purely attention-based model for audio classification.
AST achieves new state-of-the-art results of 0.485 mAP on AudioSet, 95.6% accuracy on ESC-50, and 98.1% accuracy on Speech Commands V2.
arXiv Detail & Related papers (2021-04-05T05:26:29Z) - A Unified View on Graph Neural Networks as Graph Signal Denoising [49.980783124401555]
Graph Neural Networks (GNNs) have risen to prominence in learning representations for graph structured data.
In this work, we establish mathematically that the aggregation processes in a group of representative GNN models can be regarded as solving a graph denoising problem.
We instantiate a novel GNN model, ADA-UGNN, derived from UGNN, to handle graphs with adaptive smoothness across nodes.
arXiv Detail & Related papers (2020-10-05T04:57:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.