Collaborative Attention Mechanism for Multi-View Action Recognition
- URL: http://arxiv.org/abs/2009.06599v2
- Date: Wed, 25 Nov 2020 20:30:54 GMT
- Title: Collaborative Attention Mechanism for Multi-View Action Recognition
- Authors: Yue Bai, Zhiqiang Tao, Lichen Wang, Sheng Li, Yu Yin and Yun Fu
- Abstract summary: We propose a collaborative attention mechanism (CAM) for solving the multi-view action recognition problem.
The proposed CAM detects the attention differences among multi-view, and adaptively integrates frame-level information to benefit each other.
Experiments on four action datasets illustrate the proposed CAM achieves better results for each view and also boosts multi-view performance.
- Score: 75.33062629093054
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-view action recognition (MVAR) leverages complementary temporal
information from different views to improve the learning performance. Obtaining
informative view-specific representation plays an essential role in MVAR.
Attention has been widely adopted as an effective strategy for discovering
discriminative cues underlying temporal data. However, most existing MVAR
methods only utilize attention to extract representation for each view
individually, ignoring the potential to dig latent patterns based on
mutual-support information in attention space. To this end, we propose a
collaborative attention mechanism (CAM) for solving the MVAR problem in this
paper. The proposed CAM detects the attention differences among multi-view, and
adaptively integrates frame-level information to benefit each other.
Specifically, we extend the long short-term memory (LSTM) to a Mutual-Aid RNN
(MAR) to achieve the multi-view collaboration process. CAM takes advantages of
view-specific attention pattern to guide another view and discover potential
information which is hard to be explored by itself. It paves a novel way to
leverage attention information and enhances the multi-view representation
learning. Extensive experiments on four action datasets illustrate the proposed
CAM achieves better results for each view and also boosts multi-view
performance.
Related papers
- URRL-IMVC: Unified and Robust Representation Learning for Incomplete Multi-View Clustering [28.776476995363048]
We propose a novel Unified and Representation Learning for Incomplete Multi-View Clustering (URRL-IMVC)
URRL-IMVC directly learns a unified embedding that is robust to view missing conditions by integrating information from multiple views and neighboring samples.
We extensively evaluate the proposed URRL-IMVC framework on various benchmark datasets, demonstrating its state-of-the-art performance.
arXiv Detail & Related papers (2024-07-12T09:35:25Z) - BiVRec: Bidirectional View-based Multimodal Sequential Recommendation [55.87443627659778]
We propose an innovative framework, BivRec, that jointly trains the recommendation tasks in both ID and multimodal views.
BivRec achieves state-of-the-art performance on five datasets and showcases various practical advantages.
arXiv Detail & Related papers (2024-02-27T09:10:41Z) - RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation [53.4319652364256]
This paper presents the RefSAM model, which explores the potential of SAM for referring video object segmentation.
Our proposed approach adapts the original SAM model to enhance cross-modality learning by employing a lightweight Cross-RValModal.
We employ a parameter-efficient tuning strategy to align and fuse the language and vision features effectively.
arXiv Detail & Related papers (2023-07-03T13:21:58Z) - Dual Representation Learning for One-Step Clustering of Multi-View Data [30.131568561100817]
We propose a novel one-step multi-view clustering method by exploiting the dual representation of both the common and specific information of different views.
With this framework, the representation learning and clustering partition mutually benefit each other, which effectively improve the clustering performance.
arXiv Detail & Related papers (2022-08-30T14:20:26Z) - Self-Supervised Information Bottleneck for Deep Multi-View Subspace
Clustering [29.27475285925792]
We establish a new framework called Self-supervised Information Bottleneck based Multi-view Subspace Clustering (SIB-MSC)
Inheriting the advantages from information bottleneck, SIB-MSC can learn a latent space for each view to capture common information among the latent representations of different views.
Our method achieves superior performance over the related state-of-the-art methods.
arXiv Detail & Related papers (2022-04-26T15:49:59Z) - A Variational Information Bottleneck Approach to Multi-Omics Data
Integration [98.6475134630792]
We propose a deep variational information bottleneck (IB) approach for incomplete multi-view observations.
Our method applies the IB framework on marginal and joint representations of the observed views to focus on intra-view and inter-view interactions that are relevant for the target.
Experiments on real-world datasets show that our method consistently achieves gain from data integration and outperforms state-of-the-art benchmarks.
arXiv Detail & Related papers (2021-02-05T06:05:39Z) - Generative Partial Multi-View Clustering [133.36721417531734]
We propose a generative partial multi-view clustering model, named as GP-MVC, to address the incomplete multi-view problem.
First, multi-view encoder networks are trained to learn common low-dimensional representations, followed by a clustering layer to capture the consistent cluster structure across multiple views.
Second, view-specific generative adversarial networks are developed to generate the missing data of one view conditioning on the shared representation given by other views.
arXiv Detail & Related papers (2020-03-29T17:48:27Z) - Multi-Granularity Reference-Aided Attentive Feature Aggregation for
Video-based Person Re-identification [98.7585431239291]
Video-based person re-identification aims at matching the same person across video clips.
In this paper, we propose an attentive feature aggregation module, namely Multi-Granularity Reference-Attentive Feature aggregation module MG-RAFA.
Our framework achieves the state-of-the-art ablation performance on three benchmark datasets.
arXiv Detail & Related papers (2020-03-27T03:49:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.