Related papers: RelUNet: Relative Channel Fusion U-Net for Multichannel Speech Enhancement

RelUNet: Relative Channel Fusion U-Net for Multichannel Speech Enhancement

URL: http://arxiv.org/abs/2410.05019v1
Date: Mon, 7 Oct 2024 13:19:10 GMT
Title: RelUNet: Relative Channel Fusion U-Net for Multichannel Speech Enhancement
Authors: Ibrahim Aldarmaki, Thamar Solorio, Bhiksha Raj, Hanan Aldarmaki,
Abstract summary: Multi-channel speech enhancement models, in particular those based on the U-Net architecture, demonstrate promising performance and generalization potential. We propose a novel modification of these models by incorporating relative information from the outset, where each channel is processed in conjunction with a reference channel through stacking. This input strategy exploits comparative differences to adaptively fuse information between channels, thereby capturing crucial spatial information and enhancing the overall performance.
Score: 25.878204820665516
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Neural multi-channel speech enhancement models, in particular those based on the U-Net architecture, demonstrate promising performance and generalization potential. These models typically encode input channels independently, and integrate the channels during later stages of the network. In this paper, we propose a novel modification of these models by incorporating relative information from the outset, where each channel is processed in conjunction with a reference channel through stacking. This input strategy exploits comparative differences to adaptively fuse information between channels, thereby capturing crucial spatial information and enhancing the overall performance. The experiments conducted on the CHiME-3 dataset demonstrate improvements in speech enhancement metrics across various architectures.

Related papers

RepVideo: Rethinking Cross-Layer Representation for Video Generation [53.701548524818534]
We propose RepVideo, an enhanced representation framework for text-to-video diffusion models. By accumulating features from neighboring layers to form enriched representations, this approach captures more stable semantic information. Our experiments demonstrate that our RepVideo not only significantly enhances the ability to generate accurate spatial appearances, but also improves temporal consistency in video generation.
arXiv Detail & Related papers (2025-01-15T18:20:37Z)
SOFTS: Efficient Multivariate Time Series Forecasting with Series-Core Fusion [59.96233305733875]
Time series forecasting plays a crucial role in various fields such as finance, traffic management, energy, and healthcare. Several methods utilize mechanisms like attention or mixer to address this by capturing channel correlations. This paper presents an efficient-based model, the Series-cOre Fused Time Series forecaster (SOFTS)
arXiv Detail & Related papers (2024-04-22T14:06:35Z)
CSformer: Combining Channel Independence and Mixing for Robust Multivariate Time Series Forecasting [3.6814181034608664]
We propose a strategy of channel independence followed by mixing in time series analysis. We introduce CSformer, a novel framework featuring a two-stage multiheaded self-attention mechanism. Our framework effectively incorporates sequence and channel adapters, significantly improving the model's ability to identify important information.
arXiv Detail & Related papers (2023-12-11T09:10:38Z)
Distributed Deep Joint Source-Channel Coding with Decoder-Only Side Information [6.411633100057159]
We consider low-latency image transmission over a noisy wireless channel when correlated side information is present only at the receiver side. We propose a novel neural network architecture that incorporates the decoder-only side information at multiple stages at the receiver side.
arXiv Detail & Related papers (2023-10-06T15:17:45Z)
Joint Channel Estimation and Feedback with Masked Token Transformers in Massive MIMO Systems [74.52117784544758]
This paper proposes an encoder-decoder based network that unveils the intrinsic frequency-domain correlation within the CSI matrix. The entire encoder-decoder network is utilized for channel compression. Our method outperforms state-of-the-art channel estimation and feedback techniques in joint tasks.
arXiv Detail & Related papers (2023-06-08T06:15:17Z)
Efficient Multi-Scale Attention Module with Cross-Spatial Learning [4.046170185945849]
A novel efficient multi-scale attention (EMA) module is proposed. We focus on retaining the information on per channel and decreasing the computational overhead. We conduct extensive ablation studies and experiments on image classification and object detection tasks.
arXiv Detail & Related papers (2023-05-23T00:35:47Z)
On Neural Architectures for Deep Learning-based Source Separation of Co-Channel OFDM Signals [104.11663769306566]
We study the single-channel source separation problem involving frequency-division multiplexing (OFDM) signals. We propose critical domain-informed modifications to the network parameterization, based on insights from OFDM structures.
arXiv Detail & Related papers (2023-03-11T16:29:13Z)
A Discriminative Channel Diversification Network for Image Classification [21.049734250642974]
We propose a light-weight and effective attention module, called channel diversification block, to enhance the global context. Unlike other channel attention mechanisms, the proposed module focuses on the most discriminative features. Experiments on CIFAR-10, SVHN, and Tiny-ImageNet datasets demonstrate that the proposed module improves the performance of the baseline networks by a margin of 3% on average.
arXiv Detail & Related papers (2021-12-10T23:00:53Z)
Adaptive Channel Encoding for Point Cloud Analysis [7.696435157444049]
An adaptive channel encoding mechanism is proposed to capture channel relationships in this paper. It improves the quality of the representation generated by the network by explicitly encoding the interdependence between the channels of its features.
arXiv Detail & Related papers (2021-12-05T08:20:27Z)
Convolutional Neural Network optimization via Channel Reassessment Attention module [19.566271646280978]
We propose a novel network optimization module called Channel Reassessment (CRA) module. CRA module uses channel attentions with spatial information of feature maps to enhance representational power of networks. Experiments on ImageNet and MS datasets demonstrate that embedding CRA module on various networks effectively improves the performance under different evaluation standards.
arXiv Detail & Related papers (2020-10-12T11:27:17Z)
Speaker Representation Learning using Global Context Guided Channel and Time-Frequency Transformations [67.18006078950337]
We use the global context information to enhance important channels and recalibrate salient time-frequency locations. The proposed modules, together with a popular ResNet based model, are evaluated on the VoxCeleb1 dataset.
arXiv Detail & Related papers (2020-09-02T01:07:29Z)
Operation-Aware Soft Channel Pruning using Differentiable Masks [51.04085547997066]
We propose a data-driven algorithm, which compresses deep neural networks in a differentiable way by exploiting the characteristics of operations. We perform extensive experiments and achieve outstanding performance in terms of the accuracy of output networks.
arXiv Detail & Related papers (2020-07-08T07:44:00Z)
Channel Interaction Networks for Fine-Grained Image Categorization [61.095320862647476]
Fine-grained image categorization is challenging due to the subtle inter-class differences. We propose a channel interaction network (CIN), which models the channel-wise interplay both within an image and across images. Our model can be trained efficiently in an end-to-end fashion without the need of multi-stage training and testing.
arXiv Detail & Related papers (2020-03-11T11:51:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.