Analyzing Unaligned Multimodal Sequence via Graph Convolution and Graph
Pooling Fusion
- URL: http://arxiv.org/abs/2011.13572v3
- Date: Fri, 23 Apr 2021 17:09:39 GMT
- Title: Analyzing Unaligned Multimodal Sequence via Graph Convolution and Graph
Pooling Fusion
- Authors: Sijie Mai, Songlong Xing, Jiaxuan He, Ying Zeng, Haifeng Hu
- Abstract summary: We propose a novel model, termed Multimodal Graph, to investigate the effectiveness of graph neural networks (GNN) on modeling multimodal sequential data.
Our graph-based model reaches state-of-the-art performance on two benchmark datasets.
- Score: 28.077474663199062
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we study the task of multimodal sequence analysis which aims
to draw inferences from visual, language and acoustic sequences. A majority of
existing works generally focus on aligned fusion, mostly at word level, of the
three modalities to accomplish this task, which is impractical in real-world
scenarios. To overcome this issue, we seek to address the task of multimodal
sequence analysis on unaligned modality sequences which is still relatively
underexplored and also more challenging. Recurrent neural network (RNN) and its
variants are widely used in multimodal sequence analysis, but they are
susceptible to the issues of gradient vanishing/explosion and high time
complexity due to its recurrent nature. Therefore, we propose a novel model,
termed Multimodal Graph, to investigate the effectiveness of graph neural
networks (GNN) on modeling multimodal sequential data. The graph-based
structure enables parallel computation in time dimension and can learn longer
temporal dependency in long unaligned sequences. Specifically, our Multimodal
Graph is hierarchically structured to cater to two stages, i.e., intra- and
inter-modal dynamics learning. For the first stage, a graph convolutional
network is employed for each modality to learn intra-modal dynamics. In the
second stage, given that the multimodal sequences are unaligned, the commonly
considered word-level fusion does not pertain. To this end, we devise a graph
pooling fusion network to automatically learn the associations between various
nodes from different modalities. Additionally, we define multiple ways to
construct the adjacency matrix for sequential data. Experimental results
suggest that our graph-based model reaches state-of-the-art performance on two
benchmark datasets.
Related papers
- Best of Both Worlds: Advantages of Hybrid Graph Sequence Models [20.564009321626198]
We present a unifying framework for adopting graph sequence models for learning on graphs.
We evaluate the representation power of Transformers and modern recurrent models through the lens of global and local graph tasks.
We present GSM++, a fast hybrid model that uses the Hierarchical Affinity Clustering (HAC) algorithm to tokenize the graph into hierarchical sequences.
arXiv Detail & Related papers (2024-11-23T23:24:42Z) - Discrete Graph Auto-Encoder [52.50288418639075]
We introduce a new framework named Discrete Graph Auto-Encoder (DGAE)
We first use a permutation-equivariant auto-encoder to convert graphs into sets of discrete latent node representations.
In the second step, we sort the sets of discrete latent representations and learn their distribution with a specifically designed auto-regressive model.
arXiv Detail & Related papers (2023-06-13T12:40:39Z) - MTS2Graph: Interpretable Multivariate Time Series Classification with
Temporal Evolving Graphs [1.1756822700775666]
We introduce a new framework for interpreting time series data by extracting and clustering the input representative patterns.
We run experiments on eight datasets of the UCR/UEA archive, along with HAR and PAM datasets.
arXiv Detail & Related papers (2023-06-06T16:24:27Z) - Dynamic Graph Message Passing Networks for Visual Recognition [112.49513303433606]
Modelling long-range dependencies is critical for scene understanding tasks in computer vision.
A fully-connected graph is beneficial for such modelling, but its computational overhead is prohibitive.
We propose a dynamic graph message passing network, that significantly reduces the computational complexity.
arXiv Detail & Related papers (2022-09-20T14:41:37Z) - Learning the Evolutionary and Multi-scale Graph Structure for
Multivariate Time Series Forecasting [50.901984244738806]
We show how to model the evolutionary and multi-scale interactions of time series.
In particular, we first provide a hierarchical graph structure cooperated with the dilated convolution to capture the scale-specific correlations.
A unified neural network is provided to integrate the components above to get the final prediction.
arXiv Detail & Related papers (2022-06-28T08:11:12Z) - Graph Capsule Aggregation for Unaligned Multimodal Sequences [16.679793708015534]
We introduce Graph Capsule Aggregation (GraphCAGE) to model unaligned multimodal sequences with graph-based neural model and Capsule Network.
By converting sequence data into graph, the previously mentioned problems of RNN are avoided.
In addition, the aggregation capability of Capsule Network and the graph-based structure enable our model to be interpretable and better solve the problem of long-range dependency.
arXiv Detail & Related papers (2021-08-17T10:04:23Z) - Graph Gamma Process Generalized Linear Dynamical Systems [60.467040479276704]
We introduce graph gamma process (GGP) linear dynamical systems to model real multivariate time series.
For temporal pattern discovery, the latent representation under the model is used to decompose the time series into a parsimonious set of multivariate sub-sequences.
We use the generated random graph, whose number of nonzero-degree nodes is finite, to define both the sparsity pattern and dimension of the latent state transition matrix.
arXiv Detail & Related papers (2020-07-25T04:16:34Z) - A Novel Graph-based Multi-modal Fusion Encoder for Neural Machine
Translation [131.33610549540043]
We propose a novel graph-based multi-modal fusion encoder for NMT.
We first represent the input sentence and image using a unified multi-modal graph.
We then stack multiple graph-based multi-modal fusion layers that iteratively perform semantic interactions to learn node representations.
arXiv Detail & Related papers (2020-07-17T04:06:09Z) - Connecting the Dots: Multivariate Time Series Forecasting with Graph
Neural Networks [91.65637773358347]
We propose a general graph neural network framework designed specifically for multivariate time series data.
Our approach automatically extracts the uni-directed relations among variables through a graph learning module.
Our proposed model outperforms the state-of-the-art baseline methods on 3 of 4 benchmark datasets.
arXiv Detail & Related papers (2020-05-24T04:02:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.