Related papers: Analyzing Unaligned Multimodal Sequence via Graph Convolution and Graph Pooling Fusion

Analyzing Unaligned Multimodal Sequence via Graph Convolution and Graph Pooling Fusion

URL: http://arxiv.org/abs/2011.13572v3
Date: Fri, 23 Apr 2021 17:09:39 GMT
Title: Analyzing Unaligned Multimodal Sequence via Graph Convolution and Graph Pooling Fusion
Authors: Sijie Mai, Songlong Xing, Jiaxuan He, Ying Zeng, Haifeng Hu
Abstract summary: We propose a novel model, termed Multimodal Graph, to investigate the effectiveness of graph neural networks (GNN) on modeling multimodal sequential data. Our graph-based model reaches state-of-the-art performance on two benchmark datasets.
Score: 28.077474663199062
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we study the task of multimodal sequence analysis which aims to draw inferences from visual, language and acoustic sequences. A majority of existing works generally focus on aligned fusion, mostly at word level, of the three modalities to accomplish this task, which is impractical in real-world scenarios. To overcome this issue, we seek to address the task of multimodal sequence analysis on unaligned modality sequences which is still relatively underexplored and also more challenging. Recurrent neural network (RNN) and its variants are widely used in multimodal sequence analysis, but they are susceptible to the issues of gradient vanishing/explosion and high time complexity due to its recurrent nature. Therefore, we propose a novel model, termed Multimodal Graph, to investigate the effectiveness of graph neural networks (GNN) on modeling multimodal sequential data. The graph-based structure enables parallel computation in time dimension and can learn longer temporal dependency in long unaligned sequences. Specifically, our Multimodal Graph is hierarchically structured to cater to two stages, i.e., intra- and inter-modal dynamics learning. For the first stage, a graph convolutional network is employed for each modality to learn intra-modal dynamics. In the second stage, given that the multimodal sequences are unaligned, the commonly considered word-level fusion does not pertain. To this end, we devise a graph pooling fusion network to automatically learn the associations between various nodes from different modalities. Additionally, we define multiple ways to construct the adjacency matrix for sequential data. Experimental results suggest that our graph-based model reaches state-of-the-art performance on two benchmark datasets.

Related papers

Hybrid Hypergraph Networks for Multimodal Sequence Data Classification [9.688069013427057]
We propose the hybrid hypergraph network (HHN), a novel framework that models temporal multimodal data via a segmentation-first, graph-later strategy.<n>HHN achieves state-of-the-art results on four multimodal datasets, demonstrating its effectiveness in complex classification tasks.
arXiv Detail & Related papers (2025-07-30T12:13:05Z)
Best of Both Worlds: Advantages of Hybrid Graph Sequence Models [20.564009321626198]
We present a unifying framework for adopting graph sequence models for learning on graphs. We evaluate the representation power of Transformers and modern recurrent models through the lens of global and local graph tasks. We present GSM++, a fast hybrid model that uses the Hierarchical Affinity Clustering (HAC) algorithm to tokenize the graph into hierarchical sequences.
arXiv Detail & Related papers (2024-11-23T23:24:42Z)
Discrete Graph Auto-Encoder [52.50288418639075]
We introduce a new framework named Discrete Graph Auto-Encoder (DGAE) We first use a permutation-equivariant auto-encoder to convert graphs into sets of discrete latent node representations. In the second step, we sort the sets of discrete latent representations and learn their distribution with a specifically designed auto-regressive model.
arXiv Detail & Related papers (2023-06-13T12:40:39Z)
MTS2Graph: Interpretable Multivariate Time Series Classification with Temporal Evolving Graphs [1.1756822700775666]
We introduce a new framework for interpreting time series data by extracting and clustering the input representative patterns. We run experiments on eight datasets of the UCR/UEA archive, along with HAR and PAM datasets.
arXiv Detail & Related papers (2023-06-06T16:24:27Z)
Dynamic Graph Message Passing Networks for Visual Recognition [112.49513303433606]
Modelling long-range dependencies is critical for scene understanding tasks in computer vision. A fully-connected graph is beneficial for such modelling, but its computational overhead is prohibitive. We propose a dynamic graph message passing network, that significantly reduces the computational complexity.
arXiv Detail & Related papers (2022-09-20T14:41:37Z)
Learning the Evolutionary and Multi-scale Graph Structure for Multivariate Time Series Forecasting [50.901984244738806]
We show how to model the evolutionary and multi-scale interactions of time series. In particular, we first provide a hierarchical graph structure cooperated with the dilated convolution to capture the scale-specific correlations. A unified neural network is provided to integrate the components above to get the final prediction.
arXiv Detail & Related papers (2022-06-28T08:11:12Z)
Graph Capsule Aggregation for Unaligned Multimodal Sequences [16.679793708015534]
We introduce Graph Capsule Aggregation (GraphCAGE) to model unaligned multimodal sequences with graph-based neural model and Capsule Network. By converting sequence data into graph, the previously mentioned problems of RNN are avoided. In addition, the aggregation capability of Capsule Network and the graph-based structure enable our model to be interpretable and better solve the problem of long-range dependency.
arXiv Detail & Related papers (2021-08-17T10:04:23Z)
Graph Gamma Process Generalized Linear Dynamical Systems [60.467040479276704]
We introduce graph gamma process (GGP) linear dynamical systems to model real multivariate time series. For temporal pattern discovery, the latent representation under the model is used to decompose the time series into a parsimonious set of multivariate sub-sequences. We use the generated random graph, whose number of nonzero-degree nodes is finite, to define both the sparsity pattern and dimension of the latent state transition matrix.
arXiv Detail & Related papers (2020-07-25T04:16:34Z)
A Novel Graph-based Multi-modal Fusion Encoder for Neural Machine Translation [131.33610549540043]
We propose a novel graph-based multi-modal fusion encoder for NMT. We first represent the input sentence and image using a unified multi-modal graph. We then stack multiple graph-based multi-modal fusion layers that iteratively perform semantic interactions to learn node representations.
arXiv Detail & Related papers (2020-07-17T04:06:09Z)
Connecting the Dots: Multivariate Time Series Forecasting with Graph Neural Networks [91.65637773358347]
We propose a general graph neural network framework designed specifically for multivariate time series data. Our approach automatically extracts the uni-directed relations among variables through a graph learning module. Our proposed model outperforms the state-of-the-art baseline methods on 3 of 4 benchmark datasets.
arXiv Detail & Related papers (2020-05-24T04:02:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.