The Case for Perspective in Multimodal Datasets
- URL: http://arxiv.org/abs/2205.10902v1
- Date: Sun, 22 May 2022 18:37:05 GMT
- Title: The Case for Perspective in Multimodal Datasets
- Authors: Marcelo Viridiano, Tiago Timponi Torrent, Oliver Czulo, Arthur Lorenzi
Almeida, Ely Edison da Silva Matos, Frederico Belcavello
- Abstract summary: We present a set of experiments in which FrameNet annotation is applied to the Multi30k and the Flickr 30k Entities datasets.
We assess the cosine similarity between the semantic representations derived from the annotation of both pictures and captions for frames.
- Score: 0.9786690381850356
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This paper argues in favor of the adoption of annotation practices for
multimodal datasets that recognize and represent the inherently perspectivized
nature of multimodal communication. To support our claim, we present a set of
annotation experiments in which FrameNet annotation is applied to the Multi30k
and the Flickr 30k Entities datasets. We assess the cosine similarity between
the semantic representations derived from the annotation of both pictures and
captions for frames. Our findings indicate that: (i) frame semantic similarity
between captions of the same picture produced in different languages is
sensitive to whether the caption is a translation of another caption or not,
and (ii) picture annotation for semantic frames is sensitive to whether the
image is annotated in presence of a caption or not.
Related papers
- What Makes for Good Image Captions? [50.48589893443939]
Our framework posits that good image captions should balance three key aspects: informationally sufficient, minimally redundant, and readily comprehensible by humans.
We introduce the Pyramid of Captions (PoCa) method, which generates enriched captions by integrating local and global visual information.
arXiv Detail & Related papers (2024-05-01T12:49:57Z) - Transform, Contrast and Tell: Coherent Entity-Aware Multi-Image
Captioning [0.65268245109828]
Coherent entity-aware multi-image captioning aims to generate coherent captions for neighboring images in a news document.
This paper proposes a coherent entity-aware multi-image captioning model by making use of coherence relationships.
arXiv Detail & Related papers (2023-02-04T07:50:31Z) - Paraphrase Acquisition from Image Captions [36.94459555199183]
We propose to use captions from the Web as a previously underutilized resource for paraphrases.
We analyze captions in the English Wikipedia, where editors frequently relabel the same image for different articles.
We introduce characteristic maps along the two similarity dimensions to identify the style of paraphrases coming from different sources.
arXiv Detail & Related papers (2023-01-26T10:54:51Z) - Guiding Attention using Partial-Order Relationships for Image Captioning [2.620091916172863]
A guided attention network mechanism exploits the relationship between the visual scene and text-descriptions.
A pairwise ranking objective is used for training this embedding space which allows similar images, topics and captions in the shared semantic space.
The experimental results based on MSCOCO dataset shows the competitiveness of our approach.
arXiv Detail & Related papers (2022-04-15T14:22:09Z) - Is An Image Worth Five Sentences? A New Look into Semantics for
Image-Text Matching [10.992151305603267]
We propose two metrics that evaluate the degree of semantic relevance of retrieved items, independently of their annotated binary relevance.
We incorporate a novel strategy that uses an image captioning metric, CIDEr, to define a Semantic Adaptive Margin (SAM) to be optimized in a standard triplet loss.
arXiv Detail & Related papers (2021-10-06T09:54:28Z) - Matching Visual Features to Hierarchical Semantic Topics for Image
Paragraph Captioning [50.08729005865331]
This paper develops a plug-and-play hierarchical-topic-guided image paragraph generation framework.
To capture the correlations between the image and text at multiple levels of abstraction, we design a variational inference network.
To guide the paragraph generation, the learned hierarchical topics and visual features are integrated into the language model.
arXiv Detail & Related papers (2021-05-10T06:55:39Z) - Intrinsic Image Captioning Evaluation [53.51379676690971]
We propose a learning based metrics for image captioning, which we call Intrinsic Image Captioning Evaluation(I2CE)
Experiment results show that our proposed method can keep robust performance and give more flexible scores to candidate captions when encountered with semantic similar expression or less aligned semantics.
arXiv Detail & Related papers (2020-12-14T08:36:05Z) - Dense Relational Image Captioning via Multi-task Triple-Stream Networks [95.0476489266988]
We introduce dense captioning, a novel task which aims to generate captions with respect to information between objects in a visual scene.
This framework is advantageous in both diversity and amount of information, leading to a comprehensive image understanding.
arXiv Detail & Related papers (2020-10-08T09:17:55Z) - Improving Image Captioning with Better Use of Captions [65.39641077768488]
We present a novel image captioning architecture to better explore semantics available in captions and leverage that to enhance both image representation and caption generation.
Our models first construct caption-guided visual relationship graphs that introduce beneficial inductive bias using weakly supervised multi-instance learning.
During generation, the model further incorporates visual relationships using multi-task learning for jointly predicting word and object/predicate tag sequences.
arXiv Detail & Related papers (2020-06-21T14:10:47Z) - Cross-domain Correspondence Learning for Exemplar-based Image
Translation [59.35767271091425]
We present a framework for exemplar-based image translation, which synthesizes a photo-realistic image from the input in a distinct domain.
The output has the style (e.g., color, texture) in consistency with the semantically corresponding objects in the exemplar.
We show that our method is superior to state-of-the-art methods in terms of image quality significantly.
arXiv Detail & Related papers (2020-04-12T09:10:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.