AttX: Attentive Cross-Connections for Fusion of Wearable Signals in
Emotion Recognition
- URL: http://arxiv.org/abs/2206.04625v1
- Date: Thu, 9 Jun 2022 17:18:33 GMT
- Title: AttX: Attentive Cross-Connections for Fusion of Wearable Signals in
Emotion Recognition
- Authors: Anubhav Bhatti, Behnam Behinaein, Paul Hungler, Ali Etemad
- Abstract summary: Cross-modal attentive connections is a new dynamic and effective technique for multimodal representation learning from wearable data.
We perform extensive experiments on three public multimodal wearable datasets, WESAD, SWELL-KW, and CASE.
Our method can result in superior or competitive performance to state-of-the-art and outperform a variety of baseline uni-modal and classical multimodal methods.
- Score: 15.21696076393078
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We propose cross-modal attentive connections, a new dynamic and effective
technique for multimodal representation learning from wearable data. Our
solution can be integrated into any stage of the pipeline, i.e., after any
convolutional layer or block, to create intermediate connections between
individual streams responsible for processing each modality. Additionally, our
method benefits from two properties. First, it can share information
uni-directionally (from one modality to the other) or bi-directionally. Second,
it can be integrated into multiple stages at the same time to further allow
network gradients to be exchanged in several touch-points. We perform extensive
experiments on three public multimodal wearable datasets, WESAD, SWELL-KW, and
CASE, and demonstrate that our method can effectively regulate and share
information between different modalities to learn better representations. Our
experiments further demonstrate that once integrated into simple CNN-based
multimodal solutions (2, 3, or 4 modalities), our method can result in superior
or competitive performance to state-of-the-art and outperform a variety of
baseline uni-modal and classical multimodal methods.
Related papers
- Multimodality Helps Few-Shot 3D Point Cloud Semantic Segmentation [61.91492500828508]
Few-shot 3D point cloud segmentation (FS-PCS) aims at generalizing models to segment novel categories with minimal support samples.
We introduce a cost-free multimodal FS-PCS setup, utilizing textual labels and the potentially available 2D image modality.
We propose a simple yet effective Test-time Adaptive Cross-modal Seg (TACC) technique to mitigate training bias.
arXiv Detail & Related papers (2024-10-29T19:28:41Z) - Anchors Aweigh! Sail for Optimal Unified Multi-Modal Representations [16.036997801745905]
Multimodal learning plays a crucial role in enabling machine learning models to fuse and utilize diverse data sources.
Recent binding methods, such as ImageBind, typically use a fixed anchor modality to align multimodal data in the anchor modal embedding space.
We propose CentroBind, a simple yet powerful approach that eliminates the need for a fixed anchor.
arXiv Detail & Related papers (2024-10-02T23:19:23Z) - Zoom and Shift are All You Need [0.0]
We propose a feature alignment approach that achieves full integration of multimodal information.
The proposed technique can reliably capture high-level interplay between features originating from distinct modalities.
arXiv Detail & Related papers (2024-06-13T07:09:41Z) - U3M: Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation [63.31007867379312]
We introduce U3M: An Unbiased Multiscale Modal Fusion Model for Multimodal Semantics.
We employ feature fusion at multiple scales to ensure the effective extraction and integration of both global and local features.
Experimental results demonstrate that our approach achieves superior performance across multiple datasets.
arXiv Detail & Related papers (2024-05-24T08:58:48Z) - Multimodal Information Interaction for Medical Image Segmentation [24.024848382458767]
We introduce an innovative Multimodal Information Cross Transformer (MicFormer)
It queries features from one modality and retrieves corresponding responses from another, facilitating effective communication between bimodal features.
Compared to other multimodal segmentation techniques, our method outperforms by margins of 2.83 and 4.23, respectively.
arXiv Detail & Related papers (2024-04-25T07:21:14Z) - Unified Multi-modal Unsupervised Representation Learning for
Skeleton-based Action Understanding [62.70450216120704]
Unsupervised pre-training has shown great success in skeleton-based action understanding.
We propose a Unified Multimodal Unsupervised Representation Learning framework, called UmURL.
UmURL exploits an efficient early-fusion strategy to jointly encode the multi-modal features in a single-stream manner.
arXiv Detail & Related papers (2023-11-06T13:56:57Z) - On Uni-Modal Feature Learning in Supervised Multi-Modal Learning [21.822251958013737]
We abstract the features (i.e. learned representations) of multi-modal data into 1) uni-modal features, which can be learned from uni-modal training, and 2) paired features, which can only be learned from cross-modal interactions.
We demonstrate that, under a simple guiding strategy, we can achieve comparable results to other complex late-fusion or intermediate-fusion methods on various multi-modal datasets.
arXiv Detail & Related papers (2023-05-02T07:15:10Z) - Multi-scale Cooperative Multimodal Transformers for Multimodal Sentiment
Analysis in Videos [58.93586436289648]
We propose a multi-scale cooperative multimodal transformer (MCMulT) architecture for multimodal sentiment analysis.
Our model outperforms existing approaches on unaligned multimodal sequences and has strong performance on aligned multimodal sequences.
arXiv Detail & Related papers (2022-06-16T07:47:57Z) - Channel Exchanging Networks for Multimodal and Multitask Dense Image
Prediction [125.18248926508045]
We propose Channel-Exchanging-Network (CEN) which is self-adaptive, parameter-free, and more importantly, applicable for both multimodal fusion and multitask learning.
CEN dynamically exchanges channels betweenworks of different modalities.
For the application of dense image prediction, the validity of CEN is tested by four different scenarios.
arXiv Detail & Related papers (2021-12-04T05:47:54Z) - Learning Deep Multimodal Feature Representation with Asymmetric
Multi-layer Fusion [63.72912507445662]
We propose a compact and effective framework to fuse multimodal features at multiple layers in a single network.
We verify that multimodal features can be learnt within a shared single network by merely maintaining modality-specific batch normalization layers in the encoder.
Secondly, we propose a bidirectional multi-layer fusion scheme, where multimodal features can be exploited progressively.
arXiv Detail & Related papers (2021-08-11T03:42:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.