RethinkingTMSC: An Empirical Study for Target-Oriented Multimodal
Sentiment Classification
- URL: http://arxiv.org/abs/2310.09596v2
- Date: Sat, 23 Dec 2023 12:48:58 GMT
- Title: RethinkingTMSC: An Empirical Study for Target-Oriented Multimodal
Sentiment Classification
- Authors: Junjie Ye, Jie Zhou, Junfeng Tian, Rui Wang, Qi Zhang, Tao Gui,
Xuanjing Huang
- Abstract summary: Target-oriented Multimodal Sentiment Classification (TMSC) has gained significant attention among scholars.
To investigate the causes of this problem, we perform extensive empirical evaluation and in-depth analysis of the datasets.
- Score: 70.9087014537896
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, Target-oriented Multimodal Sentiment Classification (TMSC) has
gained significant attention among scholars. However, current multimodal models
have reached a performance bottleneck. To investigate the causes of this
problem, we perform extensive empirical evaluation and in-depth analysis of the
datasets to answer the following questions: Q1: Are the modalities equally
important for TMSC? Q2: Which multimodal fusion modules are more effective? Q3:
Do existing datasets adequately support the research? Our experiments and
analyses reveal that the current TMSC systems primarily rely on the textual
modality, as most of targets' sentiments can be determined solely by text.
Consequently, we point out several directions to work on for the TMSC task in
terms of model design and dataset construction. The code and data can be found
in https://github.com/Junjie-Ye/RethinkingTMSC.
Related papers
- Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent [102.31558123570437]
Multimodal Retrieval Augmented Generation (mRAG) plays an important role in mitigating the "hallucination" issue inherent in multimodal large language models (MLLMs)
We propose the first self-adaptive planning agent for multimodal retrieval, OmniSearch.
arXiv Detail & Related papers (2024-11-05T09:27:21Z) - SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers [43.18330795060871]
SPIQA is a dataset specifically designed to interpret complex figures and tables within the context of scientific research articles.
We employ automatic and manual curation to create the dataset.
SPIQA comprises 270K questions divided into training, validation, and three different evaluation splits.
arXiv Detail & Related papers (2024-07-12T16:37:59Z) - MMSci: A Dataset for Graduate-Level Multi-Discipline Multimodal Scientific Understanding [59.41495657570397]
This dataset includes figures such as schematic diagrams, simulated images, macroscopic/microscopic photos, and experimental visualizations.
We developed benchmarks for scientific figure captioning and multiple-choice questions, evaluating six proprietary and over ten open-source models.
The dataset and benchmarks will be released to support further research.
arXiv Detail & Related papers (2024-07-06T00:40:53Z) - MCD: Diverse Large-Scale Multi-Campus Dataset for Robot Perception [24.67682960590225]
We introduce a comprehensive dataset named MCD, featuring a wide range of sensing modalities, high-accuracy ground truth, and diverse challenging environments.
MCD comprises both CCS (Classical Cylindrical Spinning) and NRE (Non-Repetitive Epicyclic) lidars, high-quality IMUs (Inertial Measurement Units), cameras, and UWB (Ultra-WideBand) sensors.
In a pioneering effort, we introduce semantic annotations of 29 classes over 59k sparse NRE lidar scans across three domains, thus providing a novel challenge to existing semantic segmentation research.
arXiv Detail & Related papers (2024-03-18T06:00:38Z) - Enhancing Human-like Multi-Modal Reasoning: A New Challenging Dataset
and Comprehensive Framework [51.44863255495668]
Multimodal reasoning is a critical component in the pursuit of artificial intelligence systems that exhibit human-like intelligence.
We present Multi-Modal Reasoning(COCO-MMR) dataset, a novel dataset that encompasses an extensive collection of open-ended questions.
We propose innovative techniques, including multi-hop cross-modal attention and sentence-level contrastive learning, to enhance the image and text encoders.
arXiv Detail & Related papers (2023-07-24T08:58:25Z) - Multimodal Chain-of-Thought Reasoning in Language Models [94.70184390935661]
We propose Multimodal-CoT that incorporates language (text) and vision (images) modalities into a two-stage framework.
Experimental results on ScienceQA and A-OKVQA benchmark datasets show the effectiveness of our proposed approach.
arXiv Detail & Related papers (2023-02-02T07:51:19Z) - Does a Technique for Building Multimodal Representation Matter? --
Comparative Analysis [0.0]
We show that the choice of the technique for building multimodal representation is crucial to obtain the highest possible model's performance.
Experiments are conducted on three datasets: Amazon Reviews, MovieLens25M, and MovieLens1M.
arXiv Detail & Related papers (2022-06-09T21:30:10Z) - Exploring Neural Models for Query-Focused Summarization [74.41256438059256]
We conduct a systematic exploration of neural approaches to query-focused summarization (QFS)
We present two model extensions that achieve state-of-the-art performance on the QMSum dataset by a margin of up to 3.38 ROUGE-1, 3.72 ROUGE-2, and 3.28 ROUGE-L.
arXiv Detail & Related papers (2021-12-14T18:33:29Z) - Multi-Task Hierarchical Learning Based Network Traffic Analytics [18.04195092141071]
We present three open datasets containing nearly 1.3M labeled flows in total.
We focus on broad aspects in network traffic analysis, including both malware detection and application classification.
As we continue to grow them, we expect the datasets to serve as a common ground for AI driven, reproducible research on network flow analytics.
arXiv Detail & Related papers (2021-06-05T02:25:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.