MACSA: A Multimodal Aspect-Category Sentiment Analysis Dataset with
Multimodal Fine-grained Aligned Annotations
- URL: http://arxiv.org/abs/2206.13969v1
- Date: Tue, 28 Jun 2022 12:49:16 GMT
- Title: MACSA: A Multimodal Aspect-Category Sentiment Analysis Dataset with
Multimodal Fine-grained Aligned Annotations
- Authors: Hao Yang, Yanyan Zhao, Jianwei Liu, Yang Wu and Bing Qin
- Abstract summary: We propose a new dataset, the Multimodal Aspect-Category Sentiment Analysis (MACSA) dataset, which contains more than 21K text-image pairs.
Based on our dataset, we propose the Multimodal ACSA task and a multimodal graph-based aligned model (MGAM), which adopts a fine-grained cross-modal fusion method.
- Score: 31.972103262426877
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal fine-grained sentiment analysis has recently attracted increasing
attention due to its broad applications. However, the existing multimodal
fine-grained sentiment datasets most focus on annotating the fine-grained
elements in text but ignore those in images, which leads to the fine-grained
elements in visual content not receiving the full attention they deserve. In
this paper, we propose a new dataset, the Multimodal Aspect-Category Sentiment
Analysis (MACSA) dataset, which contains more than 21K text-image pairs. The
dataset provides fine-grained annotations for both textual and visual content
and firstly uses the aspect category as the pivot to align the fine-grained
elements between the two modalities. Based on our dataset, we propose the
Multimodal ACSA task and a multimodal graph-based aligned model (MGAM), which
adopts a fine-grained cross-modal fusion method. Experimental results show that
our method can facilitate the baseline comparison for future research on this
corpus. We will make the dataset and code publicly available.
Related papers
- AMPLE: Emotion-Aware Multimodal Fusion Prompt Learning for Fake News Detection [0.1499944454332829]
This paper introduces Emotion-textbfAware textbfMultimodal Fusion textbfPrompt textbfLtextbfEarning (textbfAMPLE) framework to address the above issue.
This framework extracts emotional elements from texts by leveraging sentiment analysis tools.
It then employs Multi-Head Cross-Attention (MCA) mechanisms and similarity-aware fusion methods to integrate multimodal data.
arXiv Detail & Related papers (2024-10-21T02:19:24Z) - Anno-incomplete Multi-dataset Detection [67.69438032767613]
We propose a novel problem as "-incomplete Multi-dataset Detection"
We develop an end-to-end multi-task learning architecture which can accurately detect all the object categories with multiple partially annotated datasets.
arXiv Detail & Related papers (2024-08-29T03:58:21Z) - MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine [53.01393667775077]
This paper introduces MedTrinity-25M, a comprehensive, large-scale multimodal dataset for medicine.
It covers over 25 million images across 10 modalities, with multigranular annotations for more than 65 diseases.
Unlike existing approach which is limited by the availability of image-text pairs, we have developed the first automated pipeline.
arXiv Detail & Related papers (2024-08-06T02:09:35Z) - New Benchmark Dataset and Fine-Grained Cross-Modal Fusion Framework for Vietnamese Multimodal Aspect-Category Sentiment Analysis [1.053698976085779]
We introduce a new Vietnamese multimodal dataset, named ViMACSA, which consists of 4,876 text-image pairs with 14,618 fine-grained annotations for both text and image in the hotel domain.
We propose a Fine-Grained Cross-Modal Fusion Framework (FCMF) that effectively learns both intra- and inter-modality interactions and then fuses these information to produce a unified multimodal representation.
Experimental results show that our framework outperforms SOTA models on the ViMACSA dataset, achieving the highest F1 score of 79.73%.
arXiv Detail & Related papers (2024-05-01T14:29:03Z) - DRIN: Dynamic Relation Interactive Network for Multimodal Entity Linking [31.15972952813689]
We propose a novel framework called Dynamic Relation Interactive Network (DRIN) for MEL tasks.
DRIN explicitly models four different types of alignment between a mention and entity and builds a dynamic Graph Convolutional Network (GCN) to dynamically select the corresponding alignment relations for different input samples.
Experiments on two datasets show that DRIN outperforms state-of-the-art methods by a large margin, demonstrating the effectiveness of our approach.
arXiv Detail & Related papers (2023-10-09T10:21:42Z) - M$^3$Net: Multi-view Encoding, Matching, and Fusion for Few-shot
Fine-grained Action Recognition [80.21796574234287]
M$3$Net is a matching-based framework for few-shot fine-grained (FS-FG) action recognition.
It incorporates textitmulti-view encoding, textitmulti-view matching, and textitmulti-view fusion to facilitate embedding encoding, similarity matching, and decision making.
Explainable visualizations and experimental results demonstrate the superiority of M$3$Net in capturing fine-grained action details.
arXiv Detail & Related papers (2023-08-06T09:15:14Z) - MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation
of Videos [106.06278332186106]
Multimodal summarization with multimodal output (MSMO) has emerged as a promising research direction.
Numerous limitations exist within existing public MSMO datasets.
We have meticulously curated the textbfMMSum dataset.
arXiv Detail & Related papers (2023-06-07T07:43:11Z) - Align and Attend: Multimodal Summarization with Dual Contrastive Losses [57.83012574678091]
The goal of multimodal summarization is to extract the most important information from different modalities to form output summaries.
Existing methods fail to leverage the temporal correspondence between different modalities and ignore the intrinsic correlation between different samples.
We introduce Align and Attend Multimodal Summarization (A2Summ), a unified multimodal transformer-based model which can effectively align and attend the multimodal input.
arXiv Detail & Related papers (2023-03-13T17:01:42Z) - Transformer-based Multi-Aspect Modeling for Multi-Aspect Multi-Sentiment
Analysis [56.893393134328996]
We propose a novel Transformer-based Multi-aspect Modeling scheme (TMM), which can capture potential relations between multiple aspects and simultaneously detect the sentiment of all aspects in a sentence.
Our method achieves noticeable improvements compared with strong baselines such as BERT and RoBERTa.
arXiv Detail & Related papers (2020-11-01T11:06:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.