Image Matters: A New Dataset and Empirical Study for Multimodal
Hyperbole Detection
- URL: http://arxiv.org/abs/2307.00209v3
- Date: Sat, 9 Mar 2024 02:30:11 GMT
- Title: Image Matters: A New Dataset and Empirical Study for Multimodal
Hyperbole Detection
- Authors: Huixuan Zhang, Xiaojun Wan
- Abstract summary: We create a multimodal detection dataset from Weibo (a Chinese social media)
We treat the text and image from a piece of weibo as two modalities and explore the role of text and image for hyperbole detection.
Different pre-trained multimodal encoders are also evaluated on this downstream task to show their performance.
- Score: 52.04083398850383
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Hyperbole, or exaggeration, is a common linguistic phenomenon. The detection
of hyperbole is an important part of understanding human expression. There have
been several studies on hyperbole detection, but most of which focus on text
modality only. However, with the development of social media, people can create
hyperbolic expressions with various modalities, including text, images, videos,
etc. In this paper, we focus on multimodal hyperbole detection. We create a
multimodal detection dataset from Weibo (a Chinese social media) and carry out
some studies on it. We treat the text and image from a piece of weibo as two
modalities and explore the role of text and image for hyperbole detection.
Different pre-trained multimodal encoders are also evaluated on this downstream
task to show their performance. Besides, since this dataset is constructed from
five different topics, we also evaluate the cross-domain performance of
different models. These studies can serve as a benchmark and point out the
direction of further study on multimodal hyperbole detection.
Related papers
- Contrastive Learning-based Multi Modal Architecture for Emoticon Prediction by Employing Image-Text Pairs [13.922091192207718]
This research aims to analyze the relationship among sentences, visuals, and emoticons.
We have proposed a novel contrastive learning based multimodal architecture.
The proposed model attained an accuracy of 91% and an MCC-score of 90% while assessing emoticons.
arXiv Detail & Related papers (2024-08-05T15:45:59Z) - M2SA: Multimodal and Multilingual Model for Sentiment Analysis of Tweets [4.478789600295492]
This paper transforms an existing textual Twitter sentiment dataset into a multimodal format through a straightforward curation process.
Our work opens up new avenues for sentiment-related research within the research community.
arXiv Detail & Related papers (2024-04-02T09:11:58Z) - Contextual Object Detection with Multimodal Large Language Models [66.15566719178327]
We introduce a novel research problem of contextual object detection.
Three representative scenarios are investigated, including the language cloze test, visual captioning, and question answering.
We present ContextDET, a unified multimodal model that is capable of end-to-end differentiable modeling of visual-language contexts.
arXiv Detail & Related papers (2023-05-29T17:50:33Z) - A Match Made in Heaven: A Multi-task Framework for Hyperbole and
Metaphor Detection [27.85834441076481]
Hyperbole and metaphor are common in day-to-day communication.
Existing approaches to automatically detect metaphor and hyperbole have studied these language phenomena independently.
We propose a multi-task deep learning framework to detect hyperbole and metaphor simultaneously.
arXiv Detail & Related papers (2023-05-27T14:17:59Z) - IRFL: Image Recognition of Figurative Language [20.472997304393413]
Figurative forms are often conveyed through multiple modalities (e.g., both text and images)
We develop the Image Recognition of Figurative Language dataset.
We introduce two novel tasks as a benchmark for multimodal figurative language understanding.
arXiv Detail & Related papers (2023-03-27T17:59:55Z) - TextMI: Textualize Multimodal Information for Integrating Non-verbal
Cues in Pre-trained Language Models [5.668457303716451]
We propose TextMI as a general, competitive baseline for multimodal behavioral analysis tasks.
Our approach significantly reduces model complexity, adds interpretability to the model's decision, and can be applied for a diverse set of tasks.
arXiv Detail & Related papers (2023-03-27T17:54:32Z) - Towards Unifying Medical Vision-and-Language Pre-training via Soft
Prompts [63.84720380390935]
There exist two typical types, textiti.e., the fusion-encoder type and the dual-encoder type, depending on whether a heavy fusion module is used.
We propose an effective yet straightforward scheme named PTUnifier to unify the two types.
We first unify the input format by introducing visual and textual prompts, which serve as a feature bank that stores the most representative images/texts.
arXiv Detail & Related papers (2023-02-17T15:43:42Z) - On Advances in Text Generation from Images Beyond Captioning: A Case
Study in Self-Rationalization [89.94078728495423]
We show that recent advances in each modality, CLIP image representations and scaling of language models, do not consistently improve multimodal self-rationalization of tasks with multimodal inputs.
Our findings call for a backbone modelling approach that can be built on to advance text generation from images and text beyond image captioning.
arXiv Detail & Related papers (2022-05-24T00:52:40Z) - Self-Supervised Speech Representation Learning: A Review [105.1545308184483]
Self-supervised representation learning methods promise a single universal model that would benefit a wide variety of tasks and domains.
Speech representation learning is experiencing similar progress in three main categories: generative, contrastive, and predictive methods.
This review presents approaches for self-supervised speech representation learning and their connection to other research areas.
arXiv Detail & Related papers (2022-05-21T16:52:57Z) - Exploiting BERT For Multimodal Target SentimentClassification Through
Input Space Translation [75.82110684355979]
We introduce a two-stream model that translates images in input space using an object-aware transformer.
We then leverage the translation to construct an auxiliary sentence that provides multimodal information to a language model.
We achieve state-of-the-art performance on two multimodal Twitter datasets.
arXiv Detail & Related papers (2021-08-03T18:02:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.