Stock Movement Prediction with Multimodal Stable Fusion via Gated Cross-Attention Mechanism
- URL: http://arxiv.org/abs/2406.06594v1
- Date: Thu, 6 Jun 2024 03:13:34 GMT
- Title: Stock Movement Prediction with Multimodal Stable Fusion via Gated Cross-Attention Mechanism
- Authors: Chang Zong, Jian Shao, Weiming Lu, Yueting Zhuang,
- Abstract summary: This study introduces a novel architecture, named Multimodal Stable Fusion with Gated Cross-Attention (MSGCA), designed to robustly integrate multimodal input for stock movement prediction.
MSGCA framework consists of three integral components: (1) a trimodal encoding module, responsible for processing indicator sequences, dynamic documents, and a relational graph, and standardizing their feature representations; (2) a cross-feature fusion module, where primary and consistent features guide the multimodal fusion of the three modalities via a pair of gated cross-attention networks; and (3) a prediction module, which refines the fused features through temporal and dimensional reduction to execute precise
- Score: 41.16574023720132
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The accurate prediction of stock movements is crucial for investment strategies. Stock prices are subject to the influence of various forms of information, including financial indicators, sentiment analysis, news documents, and relational structures. Predominant analytical approaches, however, tend to address only unimodal or bimodal sources, neglecting the complexity of multimodal data. Further complicating the landscape are the issues of data sparsity and semantic conflicts between these modalities, which are frequently overlooked by current models, leading to unstable performance and limiting practical applicability. To address these shortcomings, this study introduces a novel architecture, named Multimodal Stable Fusion with Gated Cross-Attention (MSGCA), designed to robustly integrate multimodal input for stock movement prediction. The MSGCA framework consists of three integral components: (1) a trimodal encoding module, responsible for processing indicator sequences, dynamic documents, and a relational graph, and standardizing their feature representations; (2) a cross-feature fusion module, where primary and consistent features guide the multimodal fusion of the three modalities via a pair of gated cross-attention networks; and (3) a prediction module, which refines the fused features through temporal and dimensional reduction to execute precise movement forecasting. Empirical evaluations demonstrate that the MSGCA framework exceeds current leading methods, achieving performance gains of 8.1%, 6.1%, 21.7% and 31.6% on four multimodal datasets, respectively, attributed to its enhanced multimodal fusion stability.
Related papers
- Asynchronous Multimodal Video Sequence Fusion via Learning Modality-Exclusive and -Agnostic Representations [19.731611716111566]
We propose a Multimodal fusion approach for learning modality-Exclusive and modality-Agnostic representations.
We introduce a predictive self-attention module to capture reliable context dynamics within modalities.
A hierarchical cross-modal attention module is designed to explore valuable element correlations among modalities.
A double-discriminator strategy is presented to ensure the production of distinct representations in an adversarial manner.
arXiv Detail & Related papers (2024-07-06T04:36:48Z) - How Intermodal Interaction Affects the Performance of Deep Multimodal Fusion for Mixed-Type Time Series [3.6958071416494414]
Mixed-type time series (MTTS) is a bimodal data type common in many domains, such as healthcare, finance, environmental monitoring, and social media.
The integration of both modalities through multimodal fusion is a promising approach for processing MTTS.
We present a comprehensive evaluation of several deep multimodal fusion approaches for MTTS forecasting.
arXiv Detail & Related papers (2024-06-21T12:26:48Z) - Unified Multi-modal Unsupervised Representation Learning for
Skeleton-based Action Understanding [62.70450216120704]
Unsupervised pre-training has shown great success in skeleton-based action understanding.
We propose a Unified Multimodal Unsupervised Representation Learning framework, called UmURL.
UmURL exploits an efficient early-fusion strategy to jointly encode the multi-modal features in a single-stream manner.
arXiv Detail & Related papers (2023-11-06T13:56:57Z) - Provable Dynamic Fusion for Low-Quality Multimodal Data [94.39538027450948]
Dynamic multimodal fusion emerges as a promising learning paradigm.
Despite its widespread use, theoretical justifications in this field are still notably lacking.
This paper provides theoretical understandings to answer this question under a most popular multimodal fusion framework from the generalization perspective.
A novel multimodal fusion framework termed Quality-aware Multimodal Fusion (QMF) is proposed, which can improve the performance in terms of classification accuracy and model robustness.
arXiv Detail & Related papers (2023-06-03T08:32:35Z) - Cross-Attention is Not Enough: Incongruity-Aware Dynamic Hierarchical
Fusion for Multimodal Affect Recognition [69.32305810128994]
Incongruity between modalities poses a challenge for multimodal fusion, especially in affect recognition.
We propose the Hierarchical Crossmodal Transformer with Dynamic Modality Gating (HCT-DMG), a lightweight incongruity-aware model.
HCT-DMG: 1) outperforms previous multimodal models with a reduced size of approximately 0.8M parameters; 2) recognizes hard samples where incongruity makes affect recognition difficult; 3) mitigates the incongruity at the latent level in crossmodal attention.
arXiv Detail & Related papers (2023-05-23T01:24:15Z) - Quantifying & Modeling Multimodal Interactions: An Information
Decomposition Framework [89.8609061423685]
We propose an information-theoretic approach to quantify the degree of redundancy, uniqueness, and synergy relating input modalities with an output task.
To validate PID estimation, we conduct extensive experiments on both synthetic datasets where the PID is known and on large-scale multimodal benchmarks.
We demonstrate their usefulness in (1) quantifying interactions within multimodal datasets, (2) quantifying interactions captured by multimodal models, (3) principled approaches for model selection, and (4) three real-world case studies.
arXiv Detail & Related papers (2023-02-23T18:59:05Z) - Efficient Multimodal Transformer with Dual-Level Feature Restoration for
Robust Multimodal Sentiment Analysis [47.29528724322795]
Multimodal Sentiment Analysis (MSA) has attracted increasing attention recently.
Despite significant progress, there are still two major challenges on the way towards robust MSA.
We propose a generic and unified framework to address them, named Efficient Multimodal Transformer with Dual-Level Feature Restoration (EMT-DLFR)
arXiv Detail & Related papers (2022-08-16T08:02:30Z) - Improving Multimodal fusion via Mutual Dependency Maximisation [5.73995120847626]
Multimodal sentiment analysis is a trending area of research, and the multimodal fusion is one of its most active topic.
In this work, we investigate unexplored penalties and propose a set of new objectives that measure the dependency between modalities.
We demonstrate that our new penalties lead to a consistent improvement (up to $4.3$ on accuracy) across a large variety of state-of-the-art models.
arXiv Detail & Related papers (2021-08-31T06:26:26Z) - Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal
Sentiment Analysis [96.46952672172021]
Bi-Bimodal Fusion Network (BBFN) is a novel end-to-end network that performs fusion on pairwise modality representations.
Model takes two bimodal pairs as input due to known information imbalance among modalities.
arXiv Detail & Related papers (2021-07-28T23:33:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.