HyperFusion: Hierarchical Multimodal Ensemble Learning for Social Media Popularity Prediction
- URL: http://arxiv.org/abs/2507.00926v1
- Date: Tue, 01 Jul 2025 16:31:50 GMT
- Title: HyperFusion: Hierarchical Multimodal Ensemble Learning for Social Media Popularity Prediction
- Authors: Liliang Ye, Yunyao Zhang, Yafeng Wu, Yi-Ping Phoebe Chen, Junqing Yu, Wei Yang, Zikai Song,
- Abstract summary: Social media popularity prediction plays a crucial role in content optimization, marketing strategies, and user engagement enhancement across digital platforms.<n>This paper presents HyperFusion, a hierarchical multimodal ensemble learning framework for social media popularity prediction.
- Score: 16.78634288864967
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Social media popularity prediction plays a crucial role in content optimization, marketing strategies, and user engagement enhancement across digital platforms. However, predicting post popularity remains challenging due to the complex interplay between visual, textual, temporal, and user behavioral factors. This paper presents HyperFusion, a hierarchical multimodal ensemble learning framework for social media popularity prediction. Our approach employs a three-tier fusion architecture that progressively integrates features across abstraction levels: visual representations from CLIP encoders, textual embeddings from transformer models, and temporal-spatial metadata with user characteristics. The framework implements a hierarchical ensemble strategy combining CatBoost, TabNet, and custom multi-layer perceptrons. To address limited labeled data, we propose a two-stage training methodology with pseudo-labeling and iterative refinement. We introduce novel cross-modal similarity measures and hierarchical clustering features that capture inter-modal dependencies. Experimental results demonstrate that HyperFusion achieves competitive performance on the SMP challenge dataset. Our team achieved third place in the SMP Challenge 2025 (Image Track). The source code is available at https://anonymous.4open.science/r/SMPDImage.
Related papers
- MIM: Multi-modal Content Interest Modeling Paradigm for User Behavior Modeling [27.32474950026696]
We propose a novel Multi-modal Content Interest Modeling paradigm (MIM)<n>MIM consists of three key stages: Pre-training, Content-Interest-Aware Supervised Fine-Tuning, and Content-Interest-Aware UBM.<n>Method has been successfully deployed online, achieving a significant increase of +14.14% in CTR and +4.12% in RPM.
arXiv Detail & Related papers (2025-02-01T05:06:21Z) - TriMod Fusion for Multimodal Named Entity Recognition in Social Media [0.0]
We propose a novel approach that integrates textual, visual, and hashtag features (TriMod) for effective modality fusion.<n>We demonstrate the superiority of our approach over existing state-of-the-art methods, achieving significant improvements in precision, recall, and F1 score.
arXiv Detail & Related papers (2025-01-14T17:29:41Z) - Multimodality Helps Few-shot 3D Point Cloud Semantic Segmentation [61.91492500828508]
Few-shot 3D point cloud segmentation (FS-PCS) aims at generalizing models to segment novel categories with minimal support samples.<n>We introduce a multimodal FS-PCS setup, utilizing textual labels and the potentially available 2D image modality.<n>We propose a simple yet effective Test-time Adaptive Cross-modal (TACC) technique to mitigate training bias.
arXiv Detail & Related papers (2024-10-29T19:28:41Z) - DiffMM: Multi-Modal Diffusion Model for Recommendation [19.43775593283657]
We propose a novel multi-modal graph diffusion model for recommendation called DiffMM.
Our framework integrates a modality-aware graph diffusion model with a cross-modal contrastive learning paradigm to improve modality-aware user representation learning.
arXiv Detail & Related papers (2024-06-17T17:35:54Z) - Hierarchical Information Enhancement Network for Cascade Prediction in Social Networks [51.54002032659713]
We propose a novel Hierarchical Information Enhancement Network (HIENet) for cascade prediction.
Our approach integrates fundamental cascade sequence, user social graphs, and sub-cascade graph into a unified framework.
arXiv Detail & Related papers (2024-03-22T14:57:27Z) - Contrastive Transformer Learning with Proximity Data Generation for
Text-Based Person Search [60.626459715780605]
Given a descriptive text query, text-based person search aims to retrieve the best-matched target person from an image gallery.
Such a cross-modal retrieval task is quite challenging due to significant modality gap, fine-grained differences and insufficiency of annotated data.
In this paper, we propose a simple yet effective dual Transformer model for text-based person search.
arXiv Detail & Related papers (2023-11-15T16:26:49Z) - Hierarchical Audio-Visual Information Fusion with Multi-label Joint
Decoding for MER 2023 [51.95161901441527]
In this paper, we propose a novel framework for recognizing both discrete and dimensional emotions.
Deep features extracted from foundation models are used as robust acoustic and visual representations of raw video.
Our final system achieves state-of-the-art performance and ranks third on the leaderboard on MER-MULTI sub-challenge.
arXiv Detail & Related papers (2023-09-11T03:19:10Z) - Multi-channel Attentive Graph Convolutional Network With Sentiment
Fusion For Multimodal Sentiment Analysis [10.625579004828733]
This paper proposes a Multi-channel Attentive Graph Convolutional Network (MAGCN)
It consists of two main components: cross-modality interactive learning and sentimental feature fusion.
Experiments are conducted on three widely-used datasets.
arXiv Detail & Related papers (2022-01-25T12:38:33Z) - Routing with Self-Attention for Multimodal Capsule Networks [108.85007719132618]
We present a new multimodal capsule network that allows us to leverage the strength of capsules in the context of a multimodal learning framework.
To adapt the capsules to large-scale input data, we propose a novel routing by self-attention mechanism that selects relevant capsules.
This allows not only for robust training with noisy video data, but also to scale up the size of the capsule network compared to traditional routing methods.
arXiv Detail & Related papers (2021-12-01T19:01:26Z) - CoADNet: Collaborative Aggregation-and-Distribution Networks for
Co-Salient Object Detection [91.91911418421086]
Co-Salient Object Detection (CoSOD) aims at discovering salient objects that repeatedly appear in a given query group containing two or more relevant images.
One challenging issue is how to effectively capture co-saliency cues by modeling and exploiting inter-image relationships.
We present an end-to-end collaborative aggregation-and-distribution network (CoADNet) to capture both salient and repetitive visual patterns from multiple images.
arXiv Detail & Related papers (2020-11-10T04:28:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.