Related papers: ATGNN: Audio Tagging Graph Neural Network

ATGNN: Audio Tagging Graph Neural Network

URL: http://arxiv.org/abs/2311.01526v1
Date: Thu, 2 Nov 2023 18:19:26 GMT
Title: ATGNN: Audio Tagging Graph Neural Network
Authors: Shubhr Singh, Christian J. Steinmetz, Emmanouil Benetos, Huy Phan, Dan Stowell
Abstract summary: ATGNN is a graph neural architecture that maps semantic relationships between learnable class embeddings and spectrogram regions. We evaluate ATGNN on two audio tagging tasks, where it achieves 0.585 mAP on the FSD50K dataset and 0.335 mAP on the AudioSet-balanced dataset.
Score: 25.78859233831268
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep learning models such as CNNs and Transformers have achieved impressive performance for end-to-end audio tagging. Recent works have shown that despite stacking multiple layers, the receptive field of CNNs remains severely limited. Transformers on the other hand are able to map global context through self-attention, but treat the spectrogram as a sequence of patches which is not flexible enough to capture irregular audio objects. In this work, we treat the spectrogram in a more flexible way by considering it as graph structure and process it with a novel graph neural architecture called ATGNN. ATGNN not only combines the capability of CNNs with the global information sharing ability of Graph Neural Networks, but also maps semantic relationships between learnable class embeddings and corresponding spectrogram regions. We evaluate ATGNN on two audio tagging tasks, where it achieves 0.585 mAP on the FSD50K dataset and 0.335 mAP on the AudioSet-balanced dataset, achieving comparable results to Transformer based models with significantly lower number of learnable parameters.

Related papers

LHGNN: Local-Higher Order Graph Neural Networks For Audio Classification and Tagging [23.464493621300242]
This work introduces the Local- Higher Order Graph Neural Network (LHGNN), a graph based model that enhances feature understanding. Evaluation of the model on three publicly available audio datasets shows that it outperforms Transformer-based models across all benchmarks.
arXiv Detail & Related papers (2025-01-07T01:45:39Z)
Automatic speech recognition for the Nepali language using CNN, bidirectional LSTM and ResNet [0.0]
This paper presents an end-to-end deep learning model for Automatic Speech Recognition (ASR) that transcribes Nepali speech to text. The model was trained and tested on the OpenSLR (audio, text) dataset. The character error rate (CER) of 17.06 percent has been achieved.
arXiv Detail & Related papers (2024-06-25T12:14:01Z)
CNN2GNN: How to Bridge CNN with GNN [59.42117676779735]
We propose a novel CNN2GNN framework to unify CNN and GNN together via distillation. The performance of distilled boosted'' two-layer GNN on Mini-ImageNet is much higher than CNN containing dozens of layers such as ResNet152.
arXiv Detail & Related papers (2024-04-23T08:19:08Z)
BLIS-Net: Classifying and Analyzing Signals on Graphs [20.345611294709244]
Graph neural networks (GNNs) have emerged as a powerful tool for tasks such as node classification and graph classification. We introduce the BLIS-Net (Bi-Lipschitz Scattering Net), a novel GNN that builds on the previously introduced geometric scattering transform. We show that BLIS-Net achieves superior performance on both synthetic and real-world data sets based on traffic flow and fMRI data.
arXiv Detail & Related papers (2023-10-26T17:03:14Z)
T-GAE: Transferable Graph Autoencoder for Network Alignment [79.89704126746204]
T-GAE is a graph autoencoder framework that leverages transferability and stability of GNNs to achieve efficient network alignment without retraining. Our experiments demonstrate that T-GAE outperforms the state-of-the-art optimization method and the best GNN approach by up to 38.7% and 50.8%, respectively.
arXiv Detail & Related papers (2023-10-05T02:58:29Z)
Graph Ladling: Shockingly Simple Parallel GNN Training without Intermediate Communication [100.51884192970499]
GNNs are a powerful family of neural networks for learning over graphs. scaling GNNs either by deepening or widening suffers from prevalent issues of unhealthy gradients, over-smoothening, information squashing. We propose not to deepen or widen current GNNs, but instead present a data-centric perspective of model soups tailored for GNNs.
arXiv Detail & Related papers (2023-06-18T03:33:46Z)
Efficient Large-scale Audio Tagging via Transformer-to-CNN Knowledge Distillation [6.617487928813374]
We propose a training procedure for efficient CNNs based on offline Knowledge Distillation (KD) from high-performing yet complex transformers. We provide models of different complexity levels, scaling from low-complexity models up to a new state-of-the-art performance of.483 mAP on AudioSet.
arXiv Detail & Related papers (2022-11-09T09:58:22Z)
Dynamic Graph Message Passing Networks for Visual Recognition [112.49513303433606]
Modelling long-range dependencies is critical for scene understanding tasks in computer vision. A fully-connected graph is beneficial for such modelling, but its computational overhead is prohibitive. We propose a dynamic graph message passing network, that significantly reduces the computational complexity.
arXiv Detail & Related papers (2022-09-20T14:41:37Z)
Receptive Field Regularization Techniques for Audio Classification and Tagging with Deep Convolutional Neural Networks [7.9495796547433395]
We show that tuning the Receptive Field (RF) of CNNs is crucial to their generalization. We propose several systematic approaches to control the RF of CNNs and systematically test the resulting architectures.
arXiv Detail & Related papers (2021-05-26T08:36:29Z)
Graph Neural Networks with Adaptive Frequency Response Filter [55.626174910206046]
We develop a graph neural network framework AdaGNN with a well-smooth adaptive frequency response filter. We empirically validate the effectiveness of the proposed framework on various benchmark datasets.
arXiv Detail & Related papers (2021-04-26T19:31:21Z)
AST: Audio Spectrogram Transformer [21.46018186487818]
We introduce the Audio Spectrogram Transformer (AST), the first convolution-free, purely attention-based model for audio classification. AST achieves new state-of-the-art results of 0.485 mAP on AudioSet, 95.6% accuracy on ESC-50, and 98.1% accuracy on Speech Commands V2.
arXiv Detail & Related papers (2021-04-05T05:26:29Z)
A Unified View on Graph Neural Networks as Graph Signal Denoising [49.980783124401555]
Graph Neural Networks (GNNs) have risen to prominence in learning representations for graph structured data. In this work, we establish mathematically that the aggregation processes in a group of representative GNN models can be regarded as solving a graph denoising problem. We instantiate a novel GNN model, ADA-UGNN, derived from UGNN, to handle graphs with adaptive smoothness across nodes.
arXiv Detail & Related papers (2020-10-05T04:57:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.