A Novel Energy based Model Mechanism for Multi-modal Aspect-Based
Sentiment Analysis
- URL: http://arxiv.org/abs/2312.08084v2
- Date: Fri, 15 Dec 2023 13:00:27 GMT
- Title: A Novel Energy based Model Mechanism for Multi-modal Aspect-Based
Sentiment Analysis
- Authors: Tianshuo Peng, Zuchao Li, Ping Wang, Lefei Zhang, Hai Zhao
- Abstract summary: We propose a novel framework called DQPSA for multi-modal sentiment analysis.
PDQ module uses the prompt as both a visual query and a language query to extract prompt-aware visual information.
EPE module models the boundaries pairing of the analysis target from the perspective of an Energy-based Model.
- Score: 85.77557381023617
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-modal aspect-based sentiment analysis (MABSA) has recently attracted
increasing attention. The span-based extraction methods, such as FSUIE,
demonstrate strong performance in sentiment analysis due to their joint
modeling of input sequences and target labels. However, previous methods still
have certain limitations: (i) They ignore the difference in the focus of visual
information between different analysis targets (aspect or sentiment). (ii)
Combining features from uni-modal encoders directly may not be sufficient to
eliminate the modal gap and can cause difficulties in capturing the image-text
pairwise relevance. (iii) Existing span-based methods for MABSA ignore the
pairwise relevance of target span boundaries. To tackle these limitations, we
propose a novel framework called DQPSA for multi-modal sentiment analysis.
Specifically, our model contains a Prompt as Dual Query (PDQ) module that uses
the prompt as both a visual query and a language query to extract prompt-aware
visual information and strengthen the pairwise relevance between visual
information and the analysis target. Additionally, we introduce an Energy-based
Pairwise Expert (EPE) module that models the boundaries pairing of the analysis
target from the perspective of an Energy-based Model. This expert predicts
aspect or sentiment span based on pairwise stability. Experiments on three
widely used benchmarks demonstrate that DQPSA outperforms previous approaches
and achieves a new state-of-the-art performance.
Related papers
- PanoSent: A Panoptic Sextuple Extraction Benchmark for Multimodal Conversational Aspect-based Sentiment Analysis [74.41260927676747]
This paper bridges the gaps by introducing a multimodal conversational Sentiment Analysis (ABSA)
To benchmark the tasks, we construct PanoSent, a dataset annotated both manually and automatically, featuring high quality, large scale, multimodality, multilingualism, multi-scenarios, and covering both implicit and explicit sentiment elements.
To effectively address the tasks, we devise a novel Chain-of-Sentiment reasoning framework, together with a novel multimodal large language model (namely Sentica) and a paraphrase-based verification mechanism.
arXiv Detail & Related papers (2024-08-18T13:51:01Z) - Explore In-Context Segmentation via Latent Diffusion Models [132.26274147026854]
latent diffusion model (LDM) is an effective minimalist for in-context segmentation.
We build a new and fair in-context segmentation benchmark that includes both image and video datasets.
arXiv Detail & Related papers (2024-03-14T17:52:31Z) - Cross-target Stance Detection by Exploiting Target Analytical
Perspectives [22.320628580895164]
Cross-target stance detection (CTSD) is an important task, which infers the attitude of the destination target by utilizing annotated data from the source target.
One important approach in CTSD is to extract domain-invariant features to bridge the knowledge gap between multiple targets.
We propose a Multi-Perspective Prompt-Tuning (MPPT) model for CTSD that uses the analysis perspective as a bridge to transfer knowledge.
arXiv Detail & Related papers (2024-01-03T14:28:55Z) - Cross-Modal Reasoning with Event Correlation for Video Question
Answering [32.332251488360185]
We introduce the dense caption modality as a new auxiliary and distill event-correlated information from it to infer the correct answer.
We employ cross-modal reasoning modules for explicitly modeling inter-modal relationships and aggregating relevant information across different modalities.
We propose a question-guided self-adaptive multi-modal fusion module to collect the question-oriented and event-correlated evidence through multi-step reasoning.
arXiv Detail & Related papers (2023-12-20T02:30:39Z) - Rethinking Uncertainly Missing and Ambiguous Visual Modality in
Multi-Modal Entity Alignment [38.574204922793626]
We present a further analysis of visual modality incompleteness, benchmarking latest MMEA models on our proposed dataset MMEA-UMVM.
Our research indicates that, in the face of modality incompleteness, models succumb to overfitting the modality noise, and exhibit performance oscillations or declines at high rates of missing modality.
We introduce UMAEA, a robust multi-modal entity alignment approach designed to tackle uncertainly missing and ambiguous visual modalities.
arXiv Detail & Related papers (2023-07-30T12:16:49Z) - LOIS: Looking Out of Instance Semantics for Visual Question Answering [17.076621453814926]
We propose a model framework without bounding boxes to understand the causal nexus of object semantics in images.
We implement a mutual relation attention module to model sophisticated and deeper visual semantic relations between instance objects and background information.
Our proposed attention model can further analyze salient image regions by focusing on important word-related questions.
arXiv Detail & Related papers (2023-07-26T12:13:00Z) - Multi-Grained Multimodal Interaction Network for Entity Linking [65.30260033700338]
Multimodal entity linking task aims at resolving ambiguous mentions to a multimodal knowledge graph.
We propose a novel Multi-GraIned Multimodal InteraCtion Network $textbf(MIMIC)$ framework for solving the MEL task.
arXiv Detail & Related papers (2023-07-19T02:11:19Z) - Multimodal Relation Extraction with Cross-Modal Retrieval and Synthesis [89.04041100520881]
This research proposes to retrieve textual and visual evidence based on the object, sentence, and whole image.
We develop a novel approach to synthesize the object-level, image-level, and sentence-level information for better reasoning between the same and different modalities.
arXiv Detail & Related papers (2023-05-25T15:26:13Z) - Fusion with Hierarchical Graphs for Mulitmodal Emotion Recognition [7.147235324895931]
This paper proposes a novel hierarchical graph network (HFGCN) model that learns more informative multimodal representations.
Specifically, the proposed model fuses multimodality inputs using a two-stage graph construction approach and encodes the modality dependencies into the conversation representation.
Experiments showed the effectiveness of our proposed model for more accurate AER, which yielded state-of-the-art results on two public datasets.
arXiv Detail & Related papers (2021-09-15T08:21:01Z) - Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal
Sentiment Analysis [96.46952672172021]
Bi-Bimodal Fusion Network (BBFN) is a novel end-to-end network that performs fusion on pairwise modality representations.
Model takes two bimodal pairs as input due to known information imbalance among modalities.
arXiv Detail & Related papers (2021-07-28T23:33:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.