Detecting Euphemisms with Literal Descriptions and Visual Imagery
- URL: http://arxiv.org/abs/2211.04576v1
- Date: Tue, 8 Nov 2022 21:50:05 GMT
- Title: Detecting Euphemisms with Literal Descriptions and Visual Imagery
- Authors: \.Ilker Kesen, Aykut Erdem, Erkut Erdem and Iacer Calixto
- Abstract summary: This paper describes our two-stage system for the Euphemism Detection shared task hosted by the 3rd Workshop on Figurative Language Processing in conjunction with EMNLP 2022.
In the first stage, we seek to mitigate this ambiguity by incorporating literal descriptions into input text prompts to our baseline model. It turns out that this kind of direct supervision yields remarkable performance improvement.
In the second stage, we integrate visual supervision into our system using visual imageries, two sets of images generated by a text-to-image model by taking terms and descriptions as input. Our experiments demonstrate that visual supervision also gives a statistically significant performance boost.
- Score: 18.510509701709054
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper describes our two-stage system for the Euphemism Detection shared
task hosted by the 3rd Workshop on Figurative Language Processing in
conjunction with EMNLP 2022. Euphemisms tone down expressions about sensitive
or unpleasant issues like addiction and death. The ambiguous nature of
euphemistic words or expressions makes it challenging to detect their actual
meaning within a context. In the first stage, we seek to mitigate this
ambiguity by incorporating literal descriptions into input text prompts to our
baseline model. It turns out that this kind of direct supervision yields
remarkable performance improvement. In the second stage, we integrate visual
supervision into our system using visual imageries, two sets of images
generated by a text-to-image model by taking terms and descriptions as input.
Our experiments demonstrate that visual supervision also gives a statistically
significant performance boost. Our system achieved the second place with an F1
score of 87.2%, only about 0.9% worse than the best submission.
Related papers
- Scene Graph as Pivoting: Inference-time Image-free Unsupervised
Multimodal Machine Translation with Visual Scene Hallucination [88.74459704391214]
In this work, we investigate a more realistic unsupervised multimodal machine translation (UMMT) setup.
We represent the input images and texts with the visual and language scene graphs (SG), where such fine-grained vision-language features ensure a holistic understanding of the semantics.
Several SG-pivoting based learning objectives are introduced for unsupervised translation training.
Our method outperforms the best-performing baseline by significant BLEU scores on the task and setup.
arXiv Detail & Related papers (2023-05-20T18:17:20Z) - OPI at SemEval 2023 Task 1: Image-Text Embeddings and Multimodal
Information Retrieval for Visual Word Sense Disambiguation [0.0]
We present our submission to SemEval 2023 visual word sense disambiguation shared task.
The proposed system integrates multimodal embeddings, learning to rank methods, and knowledge-based approaches.
Our solution was ranked third in the multilingual task and won in the Persian track, one of the three language subtasks.
arXiv Detail & Related papers (2023-04-14T13:45:59Z) - Multimodal Neural Machine Translation with Search Engine Based Image
Retrieval [4.662583832063716]
We propose an open-vocabulary image retrieval method to collect descriptive images for bilingual parallel corpus.
Our proposed method achieves significant improvements over strong baselines.
arXiv Detail & Related papers (2022-07-26T08:42:06Z) - Image Retrieval from Contextual Descriptions [22.084939474881796]
Image Retrieval from Contextual Descriptions (ImageCoDe)
Models tasked with retrieving the correct image from a set of 10 minimally contrastive candidates based on a contextual description.
Best variant achieves an accuracy of 20.9 on video frames and 59.4 on static pictures, compared with 90.8 in humans.
arXiv Detail & Related papers (2022-03-29T19:18:12Z) - Word2Pix: Word to Pixel Cross Attention Transformer in Visual Grounding [59.8167502322261]
We propose Word2Pix: a one-stage visual grounding network based on encoder-decoder transformer architecture.
The embedding of each word from the query sentence is treated alike by attending to visual pixels individually.
The proposed Word2Pix outperforms existing one-stage methods by a notable margin.
arXiv Detail & Related papers (2021-07-31T10:20:15Z) - Connecting What to Say With Where to Look by Modeling Human Attention
Traces [30.8226861256742]
We introduce a unified framework to jointly model images, text, and human attention traces.
We propose two novel tasks: (1) predict a trace given an image and caption (i.e., visual grounding), and (2) predict a caption and a trace given only an image.
arXiv Detail & Related papers (2021-05-12T20:53:30Z) - This is not the Texture you are looking for! Introducing Novel
Counterfactual Explanations for Non-Experts using Generative Adversarial
Learning [59.17685450892182]
counterfactual explanation systems try to enable a counterfactual reasoning by modifying the input image.
We present a novel approach to generate such counterfactual image explanations based on adversarial image-to-image translation techniques.
Our results show that our approach leads to significantly better results regarding mental models, explanation satisfaction, trust, emotions, and self-efficacy than two state-of-the art systems.
arXiv Detail & Related papers (2020-12-22T10:08:05Z) - Visually Grounded Compound PCFGs [65.04669567781634]
Exploiting visual groundings for language understanding has recently been drawing much attention.
We study visually grounded grammar induction and learn a constituency from both unlabeled text and its visual captions.
arXiv Detail & Related papers (2020-09-25T19:07:00Z) - Grounded and Controllable Image Completion by Incorporating Lexical
Semantics [111.47374576372813]
Lexical Semantic Image Completion (LSIC) may have potential applications in art, design, and heritage conservation.
We advocate generating results faithful to both visual and lexical semantic context.
One major challenge for LSIC comes from modeling and aligning the structure of visual-semantic context.
arXiv Detail & Related papers (2020-02-29T16:54:21Z) - Learning Representations by Predicting Bags of Visual Words [55.332200948110895]
Self-supervised representation learning targets to learn convnet-based image representations from unlabeled data.
Inspired by the success of NLP methods in this area, in this work we propose a self-supervised approach based on spatially dense image descriptions.
arXiv Detail & Related papers (2020-02-27T16:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.