Related papers: A Multimodal Canonical-Correlated Graph Neural Network for Energy-Efficient Speech Enhancement

A Multimodal Canonical-Correlated Graph Neural Network for Energy-Efficient Speech Enhancement

URL: http://arxiv.org/abs/2202.04528v1
Date: Wed, 9 Feb 2022 15:47:07 GMT
Title: A Multimodal Canonical-Correlated Graph Neural Network for Energy-Efficient Speech Enhancement
Authors: Leandro Aparecido Passos, Jo\~ao Paulo Papa, Amir Hussain, Ahsan Adeel
Abstract summary: This paper proposes a novel multimodal self-supervised architecture for energy-efficient AV speech enhancement. It integrates graph neural networks with canonical correlation analysis (CCA-GNN) Experiments conducted with the benchmark ChiME3 dataset show that our proposed prior frame-based AV CCA-GNN reinforces better feature learning in the temporal context.
Score: 4.395837214164745
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper proposes a novel multimodal self-supervised architecture for energy-efficient AV speech enhancement by integrating graph neural networks with canonical correlation analysis (CCA-GNN). This builds on a state-of-the-art CCA-GNN that aims to learn representative embeddings by maximizing the correlation between pairs of augmented views of the same input while decorrelating disconnected features. The key idea of the conventional CCA-GNN involves discarding augmentation-variant information and preserving augmentation-invariant information whilst preventing capturing of redundant information. Our proposed AV CCA-GNN model is designed to deal with the challenging multimodal representation learning context. Specifically, our model improves contextual AV speech processing by maximizing canonical correlation from augmented views of the same channel, as well as canonical correlation from audio and visual embeddings. In addition, we propose a positional encoding of the nodes that considers a prior-frame sequence distance instead of a feature-space representation while computing the node's nearest neighbors. This serves to introduce temporal information in the embeddings through the neighborhood's connectivity. Experiments conducted with the benchmark ChiME3 dataset show that our proposed prior frame-based AV CCA-GNN reinforces better feature learning in the temporal context, leading to more energy-efficient speech reconstruction compared to state-of-the-art CCA-GNN and multi-layer perceptron models. The results demonstrate the potential of our proposed approach for exploitation in future assistive technology and energy-efficient multimodal devices.

Related papers

AVadCLIP: Audio-Visual Collaboration for Robust Video Anomaly Detection [57.649223695021114]
We present a novel weakly supervised framework that leverages audio-visual collaboration for robust video anomaly detection. Our framework demonstrates superior performance across multiple benchmarks, with audio integration significantly boosting anomaly detection accuracy.
arXiv Detail & Related papers (2025-04-06T13:59:16Z)
Reliable Explainability of Deep Learning Spatial-Spectral Classifiers for Improved Semantic Segmentation in Autonomous Driving [1.474723404975345]
Hyperspectral imagery (HSI) and deep neural networks (DNNs) can strengthen the accuracy of intelligent vision systems. To advance research in such safety-critical systems, determining the precise contribution of spectral information to complex DNNs' output is needed. We propose an alternative approach by leveraging the data provided by activations and weights from relevant DNN layers to better capture the relationship between input features and predictions.
arXiv Detail & Related papers (2025-02-20T10:11:27Z)
LHGNN: Local-Higher Order Graph Neural Networks For Audio Classification and Tagging [23.464493621300242]
This work introduces the Local- Higher Order Graph Neural Network (LHGNN), a graph based model that enhances feature understanding. Evaluation of the model on three publicly available audio datasets shows that it outperforms Transformer-based models across all benchmarks.
arXiv Detail & Related papers (2025-01-07T01:45:39Z)
Canonical Correlation Guided Deep Neural Network [14.188285111418516]
We present a canonical correlation guided learning framework, which allows to be realized by deep neural networks (CCDNN) In the proposed method, the optimization formulation is not restricted to maximize correlation, instead we make canonical correlation as a constraint. To reduce the redundancy induced by correlation, a redundancy filter is designed.
arXiv Detail & Related papers (2024-09-28T16:08:44Z)
Enhancing Graph Contrastive Learning with Reliable and Informative Augmentation for Recommendation [84.45144851024257]
CoGCL aims to enhance graph contrastive learning by constructing contrastive views with stronger collaborative information via discrete codes. We introduce a multi-level vector quantizer in an end-to-end manner to quantize user and item representations into discrete codes. For neighborhood structure, we propose virtual neighbor augmentation by treating discrete codes as virtual neighbors. Regarding semantic relevance, we identify similar users/items based on shared discrete codes and interaction targets to generate the semantically relevant view.
arXiv Detail & Related papers (2024-09-09T14:04:17Z)
TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals [58.865901821451295]
We present a novel two-stream feature fusion "Tensor-Convolution and Convolution-Transformer Network" (TCCT-Net) architecture. To better learn the meaningful patterns in the temporal-spatial domain, we design a "CT" stream that integrates a hybrid convolutional-transformer. In parallel, to efficiently extract rich patterns from the temporal-frequency domain, we introduce a "TC" stream that uses Continuous Wavelet Transform (CWT) to represent information in a 2D tensor form.
arXiv Detail & Related papers (2024-04-15T06:01:48Z)
Hybrid Convolutional and Attention Network for Hyperspectral Image Denoising [54.110544509099526]
Hyperspectral image (HSI) denoising is critical for the effective analysis and interpretation of hyperspectral data. We propose a hybrid convolution and attention network (HCANet) to enhance HSI denoising. Experimental results on mainstream HSI datasets demonstrate the rationality and effectiveness of the proposed HCANet.
arXiv Detail & Related papers (2024-03-15T07:18:43Z)
Dynamic Semantic Compression for CNN Inference in Multi-access Edge Computing: A Graph Reinforcement Learning-based Autoencoder [82.8833476520429]
We propose a novel semantic compression method, autoencoder-based CNN architecture (AECNN) for effective semantic extraction and compression in partial offloading. In the semantic encoder, we introduce a feature compression module based on the channel attention mechanism in CNNs, to compress intermediate data by selecting the most informative features. In the semantic decoder, we design a lightweight decoder to reconstruct the intermediate data through learning from the received compressed data to improve accuracy.
arXiv Detail & Related papers (2024-01-19T15:19:47Z)
An Efficient Speech Separation Network Based on Recurrent Fusion Dilated Convolution and Channel Attention [0.2538209532048866]
We present an efficient speech separation neural network, ARFDCN, which combines dilated convolutions, multi-scale fusion (MSF), and channel attention. Experimental results indicate that the model achieves a decent balance between performance and computational efficiency.
arXiv Detail & Related papers (2023-06-09T13:30:27Z)
Dynamic Kernels and Channel Attention with Multi-Layer Embedding Aggregation for Speaker Verification [28.833851817220616]
This paper proposes an approach to increase the model resolution capability using attention-based dynamic kernels in a convolutional neural network. The proposed dynamic convolutional model achieved 1.62% EER and 0.18 miniDCF on the VoxCeleb1 test set and has a 17% relative improvement compared to the ECAPA-TDNN.
arXiv Detail & Related papers (2022-11-03T17:13:28Z)
Canonical Cortical Graph Neural Networks and its Application for Speech Enhancement in Future Audio-Visual Hearing Aids [0.726437825413781]
This paper proposes a more biologically plausible self-supervised machine learning approach that combines multimodal information using intra-layer modulations together with canonical correlation analysis (CCA) The approach outperformed recent state-of-the-art results considering both better clean audio reconstruction and energy efficiency, described by a reduced and smother neuron firing rate distribution.
arXiv Detail & Related papers (2022-06-06T15:20:07Z)
Graph-based Algorithm Unfolding for Energy-aware Power Allocation in Wireless Networks [27.600081147252155]
We develop a novel graph sumable framework to maximize energy efficiency in wireless communication networks. We show the permutation training which is a desirable property for models of wireless network data. Results demonstrate its generalizability across different network topologies.
arXiv Detail & Related papers (2022-01-27T20:23:24Z)
Hybrid SNN-ANN: Energy-Efficient Classification and Object Detection for Event-Based Vision [64.71260357476602]
Event-based vision sensors encode local pixel-wise brightness changes in streams of events rather than image frames. Recent progress in object recognition from event-based sensors has come from conversions of deep neural networks. We propose a hybrid architecture for end-to-end training of deep neural networks for event-based pattern recognition and object detection.
arXiv Detail & Related papers (2021-12-06T23:45:58Z)
Video Frame Interpolation Transformer [86.20646863821908]
We propose a Transformer-based video framework that allows content-aware aggregation weights and considers long-range dependencies with the self-attention operations. To avoid the high computational cost of global self-attention, we introduce the concept of local attention into video. In addition, we develop a multi-scale frame scheme to fully realize the potential of Transformers.
arXiv Detail & Related papers (2021-11-27T05:35:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.