Syntax-aware Hybrid prompt model for Few-shot multi-modal sentiment
analysis
- URL: http://arxiv.org/abs/2306.01312v2
- Date: Mon, 31 Jul 2023 09:03:40 GMT
- Title: Syntax-aware Hybrid prompt model for Few-shot multi-modal sentiment
analysis
- Authors: Zikai Zhou, Haisong Feng, Baiyou Qiao, Gang Wu, Donghong Han
- Abstract summary: Multimodal Sentiment Analysis (MSA) has been a popular topic in natural language processing nowadays.
It is practical to explore the method for few-shot sentiment analysis in cross-modalities.
- Score: 0.7693465097015469
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal Sentiment Analysis (MSA) has been a popular topic in natural
language processing nowadays, at both sentence and aspect level. However, the
existing approaches almost require large-size labeled datasets, which bring
about large consumption of time and resources. Therefore, it is practical to
explore the method for few-shot sentiment analysis in cross-modalities.
Previous works generally execute on textual modality, using the prompt-based
methods, mainly two types: hand-crafted prompts and learnable prompts. The
existing approach in few-shot multi-modality sentiment analysis task has
utilized both methods, separately. We further design a hybrid pattern that can
combine one or more fixed hand-crafted prompts and learnable prompts and
utilize the attention mechanisms to optimize the prompt encoder. The
experiments on both sentence-level and aspect-level datasets prove that we get
a significant outperformance.
Related papers
- Adaptive Prompting: Ad-hoc Prompt Composition for Social Bias Detection [30.836788377666]
We propose an adaptive prompting approach that predicts the optimal prompt composition ad-hoc for a given input.
We apply our approach to social bias detection, a highly context-dependent task that requires semantic understanding.
Our approach robustly ensures high detection performance, and is best in several settings.
arXiv Detail & Related papers (2025-02-10T14:06:19Z) - A Comprehensive Framework for Semantic Similarity Analysis of Human and AI-Generated Text Using Transformer Architectures and Ensemble Techniques [40.704014941800594]
Traditional methods fail to capture nuanced semantic differences between human and machine-generated content.
We propose a novel approach that combines a pre-trained DeBERTa-v3-large model, Bi-directional LSTMs, and linear attention pooling to capture both local and global semantic patterns.
Experimental results show that this approach works better than traditional methods, proving its usefulness for AI-generated text detection and other text comparison tasks.
arXiv Detail & Related papers (2025-01-24T07:07:37Z) - VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation [100.06122876025063]
This paper introduces VisDoMBench, the first comprehensive benchmark designed to evaluate QA systems in multi-document settings.
We propose VisDoMRAG, a novel multimodal Retrieval Augmented Generation (RAG) approach that simultaneously utilizes visual and textual RAG.
arXiv Detail & Related papers (2024-12-14T06:24:55Z) - AMPLE: Emotion-Aware Multimodal Fusion Prompt Learning for Fake News Detection [0.1499944454332829]
This paper introduces Emotion-textbfAware textbfMultimodal Fusion textbfPrompt textbfLtextbfEarning (textbfAMPLE) framework to address the above issue.
This framework extracts emotional elements from texts by leveraging sentiment analysis tools.
It then employs Multi-Head Cross-Attention (MCA) mechanisms and similarity-aware fusion methods to integrate multimodal data.
arXiv Detail & Related papers (2024-10-21T02:19:24Z) - GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric Learning [51.677086019209554]
We propose a Generalized Structural Sparse to capture powerful relationships across modalities for pair-wise similarity learning.
The distance metric delicately encapsulates two formats of diagonal and block-diagonal terms.
Experiments on cross-modal and two extra uni-modal retrieval tasks have validated its superiority and flexibility.
arXiv Detail & Related papers (2024-10-20T03:45:50Z) - PanoSent: A Panoptic Sextuple Extraction Benchmark for Multimodal Conversational Aspect-based Sentiment Analysis [74.41260927676747]
This paper bridges the gaps by introducing a multimodal conversational Sentiment Analysis (ABSA)
To benchmark the tasks, we construct PanoSent, a dataset annotated both manually and automatically, featuring high quality, large scale, multimodality, multilingualism, multi-scenarios, and covering both implicit and explicit sentiment elements.
To effectively address the tasks, we devise a novel Chain-of-Sentiment reasoning framework, together with a novel multimodal large language model (namely Sentica) and a paraphrase-based verification mechanism.
arXiv Detail & Related papers (2024-08-18T13:51:01Z) - Multimodal Prompt Learning with Missing Modalities for Sentiment Analysis and Emotion Recognition [52.522244807811894]
We propose a novel multimodal Transformer framework using prompt learning to address the issue of missing modalities.
Our method introduces three types of prompts: generative prompts, missing-signal prompts, and missing-type prompts.
Through prompt learning, we achieve a substantial reduction in the number of trainable parameters.
arXiv Detail & Related papers (2024-07-07T13:55:56Z) - Mixture-of-Prompt-Experts for Multi-modal Semantic Understanding [7.329728566839757]
We propose Mixture-of-Prompt-Experts with Block-Aware Prompt Fusion (MoPE-BAF)
MoPE-BAF is a novel multi-modal soft prompt framework based on the unified vision-language model (VLM)
arXiv Detail & Related papers (2024-03-17T19:12:26Z) - Multi-Prompt with Depth Partitioned Cross-Modal Learning [25.239388488952375]
Partitioned Multi-modal Prompt (PMPO) is a multi-modal prompting technique that extends the soft prompt from a single learnable prompt to multiple prompts.
Our method divides the visual encoder depths and connects learnable prompts to the separated visual depths, enabling different prompts to capture hierarchical contextual depths.
We evaluate the effectiveness of our approach on three challenging tasks: new class generalization, cross-dataset evaluation, and domain generalization.
arXiv Detail & Related papers (2023-05-10T14:54:29Z) - Towards Unifying Medical Vision-and-Language Pre-training via Soft
Prompts [63.84720380390935]
There exist two typical types, textiti.e., the fusion-encoder type and the dual-encoder type, depending on whether a heavy fusion module is used.
We propose an effective yet straightforward scheme named PTUnifier to unify the two types.
We first unify the input format by introducing visual and textual prompts, which serve as a feature bank that stores the most representative images/texts.
arXiv Detail & Related papers (2023-02-17T15:43:42Z) - Multi-Modal Few-Shot Object Detection with Meta-Learning-Based
Cross-Modal Prompting [77.69172089359606]
We study multi-modal few-shot object detection (FSOD) in this paper, using both few-shot visual examples and class semantic information for detection.
Our approach is motivated by the high-level conceptual similarity of (metric-based) meta-learning and prompt-based learning.
We comprehensively evaluate the proposed multi-modal FSOD models on multiple few-shot object detection benchmarks, achieving promising results.
arXiv Detail & Related papers (2022-04-16T16:45:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.