Related papers: Multimodal Multi-loss Fusion Network for Sentiment Analysis

Multimodal Multi-loss Fusion Network for Sentiment Analysis

URL: http://arxiv.org/abs/2308.00264v4
Date: Sun, 2 Jun 2024 19:12:57 GMT
Title: Multimodal Multi-loss Fusion Network for Sentiment Analysis
Authors: Zehui Wu, Ziwei Gong, Jaywon Koo, Julia Hirschberg,
Abstract summary: This paper investigates the optimal selection and fusion of feature encoders across multiple modalities to improve sentiment detection. We compare different fusion methods and examine the impact of multi-loss training within the multi-modality fusion network. We have found that integrating context significantly enhances model performance.
Score: 3.8611070161950902
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper investigates the optimal selection and fusion of feature encoders across multiple modalities and combines these in one neural network to improve sentiment detection. We compare different fusion methods and examine the impact of multi-loss training within the multi-modality fusion network, identifying surprisingly important findings relating to subnet performance. We have also found that integrating context significantly enhances model performance. Our best model achieves state-of-the-art performance for three datasets (CMU-MOSI, CMU-MOSEI and CH-SIMS). These results suggest a roadmap toward an optimized feature selection and fusion approach for enhancing sentiment detection in neural networks.

Related papers

MTPareto: A MultiModal Targeted Pareto Framework for Fake News Detection [34.09249215878179]
Multimodal fake news detection is essential for maintaining the authenticity of Internet multimedia information. To address this problem, we propose the MTPareto framework to optimize multimodal fusion. Experiment results on FakeSV and FVC datasets show that the proposed framework outperforms baselines.
arXiv Detail & Related papers (2025-01-12T10:14:29Z)
Diffusion Models as Network Optimizers: Explorations and Analysis [71.69869025878856]
generative diffusion models (GDMs) have emerged as a promising new approach to network optimization. In this study, we first explore the intrinsic characteristics of generative models. We provide a concise theoretical and intuitive demonstration of the advantages of generative models over discriminative network optimization.
arXiv Detail & Related papers (2024-11-01T09:05:47Z)
GCM-Net: Graph-enhanced Cross-Modal Infusion with a Metaheuristic-Driven Network for Video Sentiment and Emotion Analysis [2.012311338995539]
This paper presents a novel framework that leverages the multi-modal contextual information from utterances and applies metaheuristic algorithms to learn for utterance-level sentiment and emotion prediction. To show the effectiveness of our approach, we have conducted extensive evaluations on three prominent multimodal benchmark datasets.
arXiv Detail & Related papers (2024-10-02T10:07:48Z)
Multi-Objective Neural Architecture Search for In-Memory Computing [0.5892638927736115]
We employ neural architecture search (NAS) to enhance the efficiency of deploying diverse machine learning (ML) tasks on in-memory computing architectures. Our evaluation of this NAS approach for IMC architecture deployment spans three distinct image classification datasets.
arXiv Detail & Related papers (2024-06-10T19:17:09Z)
MLF-DET: Multi-Level Fusion for Cross-Modal 3D Object Detection [54.52102265418295]
We propose a novel and effective Multi-Level Fusion network, named as MLF-DET, for high-performance cross-modal 3D object DETection. For the feature-level fusion, we present the Multi-scale Voxel Image fusion (MVI) module, which densely aligns multi-scale voxel features with image features. For the decision-level fusion, we propose the lightweight Feature-cued Confidence Rectification (FCR) module, which exploits image semantics to rectify the confidence of detection candidates.
arXiv Detail & Related papers (2023-07-18T11:26:02Z)
An Empirical Study of Multimodal Model Merging [148.48412442848795]
Model merging is a technique that fuses multiple models trained on different tasks to generate a multi-task solution. We conduct our study for a novel goal where we can merge vision, language, and cross-modal transformers of a modality-specific architecture. We propose two metrics that assess the distance between weights to be merged and can serve as an indicator of the merging outcomes.
arXiv Detail & Related papers (2023-04-28T15:43:21Z)
Sparse Interaction Additive Networks via Feature Interaction Detection and Sparse Selection [10.191597755296163]
We develop a tractable selection algorithm to efficiently identify the necessary feature combinations. Our proposed Sparse Interaction Additive Networks (SIAN) construct a bridge from simple and interpretable models to fully connected neural networks.
arXiv Detail & Related papers (2022-09-19T19:57:17Z)
Multimodal E-Commerce Product Classification Using Hierarchical Fusion [0.0]
The proposed method significantly outperformed the unimodal models' performance and the reported performance of similar models on our specific task. We did experiments with multiple fusing techniques and found, that the best performing technique to combine the individual embedding of the unimodal network is based on combining concatenation and averaging the feature vectors.
arXiv Detail & Related papers (2022-07-07T14:04:42Z)
Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal Sentiment Analysis [96.46952672172021]
Bi-Bimodal Fusion Network (BBFN) is a novel end-to-end network that performs fusion on pairwise modality representations. Model takes two bimodal pairs as input due to known information imbalance among modalities.
arXiv Detail & Related papers (2021-07-28T23:33:42Z)
Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems. Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC. We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z)
A novel multimodal fusion network based on a joint coding model for lane line segmentation [22.89466867866239]
We introduce a novel multimodal fusion architecture from an information theory perspective. We demonstrate its practical utility using LiDAR camera fusion networks. Our optimal fusion network achieves 85%+ lane line accuracy and 98.7%+ overall.
arXiv Detail & Related papers (2021-03-20T06:47:58Z)
Recursive Multi-model Complementary Deep Fusion forRobust Salient Object Detection via Parallel Sub Networks [62.26677215668959]
Fully convolutional networks have shown outstanding performance in the salient object detection (SOD) field. This paper proposes a wider'' network architecture which consists of parallel sub networks with totally different network architectures. Experiments on several famous benchmarks clearly demonstrate the superior performance, good generalization, and powerful learning ability of the proposed wider framework.
arXiv Detail & Related papers (2020-08-07T10:39:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.