Related papers: H3M-SSMoEs: Hypergraph-based Multimodal Learning with LLM Reasoning and Style-Structured Mixture of Experts

H3M-SSMoEs: Hypergraph-based Multimodal Learning with LLM Reasoning and Style-Structured Mixture of Experts

URL: http://arxiv.org/abs/2510.25091v1
Date: Wed, 29 Oct 2025 01:54:52 GMT
Title: H3M-SSMoEs: Hypergraph-based Multimodal Learning with LLM Reasoning and Style-Structured Mixture of Experts
Authors: Peilin Tan, Liang Xie, Churan Zhi, Dian Tu, Chuanqi Shi,
Abstract summary: H3M-SSMoEs is a novel Hypergraph-based MultiModal architecture with LLM reasoning and Style-Structured Mixture Experts.<n> experiments on three major stock markets demonstrate that H3M-SSMoEs surpasses state-of-the-art methods in both superior predictive accuracy and investment performance, while exhibiting effective risk control.
Score: 4.041490867852946
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Stock movement prediction remains fundamentally challenging due to complex temporal dependencies, heterogeneous modalities, and dynamically evolving inter-stock relationships. Existing approaches often fail to unify structural, semantic, and regime-adaptive modeling within a scalable framework. This work introduces H3M-SSMoEs, a novel Hypergraph-based MultiModal architecture with LLM reasoning and Style-Structured Mixture of Experts, integrating three key innovations: (1) a Multi-Context Multimodal Hypergraph that hierarchically captures fine-grained spatiotemporal dynamics via a Local Context Hypergraph (LCH) and persistent inter-stock dependencies through a Global Context Hypergraph (GCH), employing shared cross-modal hyperedges and Jensen-Shannon Divergence weighting mechanism for adaptive relational learning and cross-modal alignment; (2) a LLM-enhanced reasoning module, which leverages a frozen large language model with lightweight adapters to semantically fuse and align quantitative and textual modalities, enriching representations with domain-specific financial knowledge; and (3) a Style-Structured Mixture of Experts (SSMoEs) that combines shared market experts and industry-specialized experts, each parameterized by learnable style vectors enabling regime-aware specialization under sparse activation. Extensive experiments on three major stock markets demonstrate that H3M-SSMoEs surpasses state-of-the-art methods in both superior predictive accuracy and investment performance, while exhibiting effective risk control. Datasets, source code, and model weights are available at our GitHub repository: https://github.com/PeilinTime/H3M-SSMoEs.

Related papers

Gated Fusion Enhanced Multi-Scale Hierarchical Graph Convolutional Network for Stock Movement Prediction [12.806794559293289]
We introduce MS-HGFN (Multi-Scale Hierarchical Graph Fusion Network)<n>The model features a hierarchical GNN module that forms dynamic graphs by learning patterns from intra-attributes and features from inter-attributes over different time scales.<n> Experiments utilizing real-world datasets from U.S. and Chinese stock markets demonstrate that MS-HGFN outperforms both traditional and advanced models.
arXiv Detail & Related papers (2025-11-03T13:39:23Z)
MaGNet: A Mamba Dual-Hypergraph Network for Stock Prediction via Temporal-Causal and Global Relational Learning [3.2859360081297715]
This work introduces MaGNet, a novel Mamba dual-hyperGraph Network for stock prediction.<n>MaGNet integrates a MAGE block, Feature-wise and Stock-wise 2D Spatiotemporal Attention modules, and a dual hypergraph framework.<n>Experiments on six major stock indices demonstrate MaGNet outperforms state-of-the-art methods in both superior predictive performance and exceptional investment returns.
arXiv Detail & Related papers (2025-10-29T20:47:16Z)
NExT-OMNI: Towards Any-to-Any Omnimodal Foundation Models with Discrete Flow Matching [64.10695425442164]
We introduce NExT-OMNI, an open-source omnimodal foundation model that achieves unified modeling through discrete flow paradigms.<n>Trained on large-scale interleaved text, image, video, and audio data, NExT-OMNI delivers competitive performance on multimodal generation and understanding benchmarks.<n>To advance further research, we release training details, data protocols, and open-source both the code and model checkpoints.
arXiv Detail & Related papers (2025-10-15T16:25:18Z)
Beyond Benchmarks: Understanding Mixture-of-Experts Models through Internal Mechanisms [55.1784306456972]
Mixture-of-Experts (MoE) architectures have emerged as a promising direction, offering efficiency and scalability by activating only a subset of parameters during inference.<n>We use an internal metric to investigate the mechanisms of MoE architecture by explicitly incorporating routing mechanisms and analyzing expert-level behaviors.<n>We uncover several findings: (1) neuron utilization decreases as models evolve, reflecting stronger generalization; (2) training exhibits a dynamic trajectory, where benchmark performance alone provides limited signal; (3) task completion emerges from collaborative contributions of multiple experts, with shared experts driving concentration; and (4) activation patterns at the neuron level provide a fine-grained proxy for data diversity.
arXiv Detail & Related papers (2025-09-28T15:13:38Z)
MSD-KMamba: Bidirectional Spatial-Aware Multi-Modal 3D Brain Segmentation via Multi-scale Self-Distilled Fusion Strategy [15.270952880303533]
We propose a novel 3D multi-modal image segmentation framework, MSD-KMamba.<n>It integrates bidirectional spatial perception with multi-scale self-distillation.<n>Our framework consistently outperforms state-of-the-art methods in segmentation accuracy, robustness, and generalization.
arXiv Detail & Related papers (2025-09-28T06:34:01Z)
Dynamic Adaptive Shared Experts with Grouped Multi-Head Attention Mixture of Experts [10.204413386807564]
We propose a Dynamic Adaptive Shared Expert and Grouped Multi-Head Attention Hybrid Model (DASG-MoE) to enhance long-sequence modeling capabilities.<n>First, we employ the Grouped Multi-Head Attention (GMHA) mechanism to effectively reduce the computational complexity of long sequences.<n>Second, we design a Dual-Scale Shared Expert Structure (DSSE), where shallow experts use lightweight computations to quickly respond to low-dimensional features.<n>Third, we propose a hierarchical Adaptive Dynamic Routing (ADR) mechanism that dynamically selects expert levels based on feature complexity and task requirements.
arXiv Detail & Related papers (2025-09-05T02:49:15Z)
MoIIE: Mixture of Intra- and Inter-Modality Experts for Large Vision Language Models [52.876185634349575]
We propose to incorporate Mixture of Intra- and Inter-Modality Experts (MoIIE) to Large Vision-Language Models (LVLMs)<n>For each token, expert routing is guided by its modality, directing tokens to their respective intra-modality experts as well as a shared pool of inter-modality experts.<n>Our MoIIE models with 5.5B and 11.3B activated parameters match or even surpass the performance of existing advanced open-source MoE-LLMs based multi-modal models.
arXiv Detail & Related papers (2025-08-13T13:00:05Z)
MMaDA: Multimodal Large Diffusion Language Models [61.13527224215318]
We introduce MMaDA, a novel class of multimodal diffusion foundation models.<n>It is designed to achieve superior performance across diverse domains such as textual reasoning, multimodal understanding, and text-to-image generation.
arXiv Detail & Related papers (2025-05-21T17:59:05Z)
Astrea: A MOE-based Visual Understanding Model with Progressive Alignment [10.943104653307294]
Vision-Language Models (VLMs) based on Mixture-of-Experts (MoE) architectures have emerged as a pivotal paradigm in multimodal understanding.<n>We propose Astrea, a novel multi-expert collaborative VLM architecture based on progressive pre-alignment.
arXiv Detail & Related papers (2025-03-12T14:44:52Z)
M3-JEPA: Multimodal Alignment via Multi-gate MoE based on the Joint-Embedding Predictive Architecture [6.928469290518152]
We introduce the Joint-Embedding Predictive Architecture (JEPA) on the multimodal tasks.<n>It converts the input embedding into the output embedding space by a predictor and then conducts the cross-modal alignment on the latent space.<n>We show that M3-JEPA can obtain state-of-the-art performance on different modalities and tasks, generalize to unseen datasets and domains, and is computationally efficient in both training and inference.
arXiv Detail & Related papers (2024-09-09T10:40:50Z)
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts [54.529880848937104]
We develop a unified MLLM with the MoE architecture, named Uni-MoE, that can handle a wide array of modalities. Specifically, it features modality-specific encoders with connectors for a unified multimodal representation. We evaluate the instruction-tuned Uni-MoE on a comprehensive set of multimodal datasets.
arXiv Detail & Related papers (2024-05-18T12:16:01Z)
Parameter-Efficient Mixture-of-Experts Architecture for Pre-trained Language Models [68.9288651177564]
We present a novel MoE architecture based on matrix product operators (MPO) from quantum many-body physics. With the decomposed MPO structure, we can reduce the parameters of the original MoE architecture. Experiments on the three well-known downstream natural language datasets based on GPT2 show improved performance and efficiency in increasing model capacity.
arXiv Detail & Related papers (2022-03-02T13:44:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.