A Baseline for Detecting Out-of-Distribution Examples in Image
Captioning
- URL: http://arxiv.org/abs/2207.05418v1
- Date: Tue, 12 Jul 2022 09:29:57 GMT
- Title: A Baseline for Detecting Out-of-Distribution Examples in Image
Captioning
- Authors: Gabi Shalev, Gal-Lev Shalev, Joseph Keshet
- Abstract summary: We consider the problem of OOD detection in image captioning.
We show the effectiveness of the caption's likelihood score at detecting and rejecting OOD images.
- Score: 12.953517767147998
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Image captioning research achieved breakthroughs in recent years by
developing neural models that can generate diverse and high-quality
descriptions for images drawn from the same distribution as training images.
However, when facing out-of-distribution (OOD) images, such as corrupted
images, or images containing unknown objects, the models fail in generating
relevant captions.
In this paper, we consider the problem of OOD detection in image captioning.
We formulate the problem and suggest an evaluation setup for assessing the
model's performance on the task. Then, we analyze and show the effectiveness of
the caption's likelihood score at detecting and rejecting OOD images, which
implies that the relatedness between the input image and the generated caption
is encapsulated within the score.
Related papers
- Image2Text2Image: A Novel Framework for Label-Free Evaluation of Image-to-Text Generation with Text-to-Image Diffusion Models [16.00576040281808]
We propose a novel framework called Image2Text2Image to evaluate image captioning models.
A high similarity score suggests that the model has produced a faithful textual description, while a low score highlights discrepancies.
Our framework does not rely on human-annotated captions reference, making it a valuable tool for assessing image captioning models.
arXiv Detail & Related papers (2024-11-08T17:07:01Z) - A Novel Evaluation Framework for Image2Text Generation [15.10524860121122]
We propose an evaluation framework rooted in a modern large language model (LLM) capable of image generation.
A high similarity score suggests that the image captioning model has accurately generated textual descriptions.
A low similarity score indicates discrepancies, revealing potential shortcomings in the model's performance.
arXiv Detail & Related papers (2024-08-03T09:27:57Z) - BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues [47.213906345208315]
We propose BRIDGE, a new learnable and reference-free image captioning metric.
Our proposal achieves state-of-the-art results compared to existing reference-free evaluation scores.
arXiv Detail & Related papers (2024-07-29T18:00:17Z) - Towards Retrieval-Augmented Architectures for Image Captioning [81.11529834508424]
This work presents a novel approach towards developing image captioning models that utilize an external kNN memory to improve the generation process.
Specifically, we propose two model variants that incorporate a knowledge retriever component that is based on visual similarities.
We experimentally validate our approach on COCO and nocaps datasets and demonstrate that incorporating an explicit external memory can significantly enhance the quality of captions.
arXiv Detail & Related papers (2024-05-21T18:02:07Z) - Anomaly Score: Evaluating Generative Models and Individual Generated Images based on Complexity and Vulnerability [21.355484227864466]
We investigate the relationship between the representation space and input space around generated images.
We introduce a new metric to evaluating image-generative models called anomaly score (AS)
arXiv Detail & Related papers (2023-12-17T07:33:06Z) - Image Quality Assessment using Contrastive Learning [50.265638572116984]
We train a deep Convolutional Neural Network (CNN) using a contrastive pairwise objective to solve the auxiliary problem.
We show through extensive experiments that CONTRIQUE achieves competitive performance when compared to state-of-the-art NR image quality models.
Our results suggest that powerful quality representations with perceptual relevance can be obtained without requiring large labeled subjective image quality datasets.
arXiv Detail & Related papers (2021-10-25T21:01:00Z) - Learning Conditional Knowledge Distillation for Degraded-Reference Image
Quality Assessment [157.1292674649519]
We propose a practical solution named degraded-reference IQA (DR-IQA)
DR-IQA exploits the inputs of IR models, degraded images, as references.
Our results can even be close to the performance of full-reference settings.
arXiv Detail & Related papers (2021-08-18T02:35:08Z) - Detection and Captioning with Unseen Object Classes [12.894104422808242]
Test images may contain visual objects with no corresponding visual or textual training examples.
We propose a detection-driven approach based on a generalized zero-shot detection model and a template-based sentence generation model.
Our experiments show that the proposed zero-shot detection model obtains state-of-the-art performance on the MS-COCO dataset.
arXiv Detail & Related papers (2021-08-13T10:43:20Z) - An Unsupervised Sampling Approach for Image-Sentence Matching Using
Document-Level Structural Information [64.66785523187845]
We focus on the problem of unsupervised image-sentence matching.
Existing research explores to utilize document-level structural information to sample positive and negative instances for model training.
We propose a new sampling strategy to select additional intra-document image-sentence pairs as positive or negative samples.
arXiv Detail & Related papers (2021-03-21T05:43:29Z) - Intrinsic Image Captioning Evaluation [53.51379676690971]
We propose a learning based metrics for image captioning, which we call Intrinsic Image Captioning Evaluation(I2CE)
Experiment results show that our proposed method can keep robust performance and give more flexible scores to candidate captions when encountered with semantic similar expression or less aligned semantics.
arXiv Detail & Related papers (2020-12-14T08:36:05Z) - Comprehensive Image Captioning via Scene Graph Decomposition [51.660090468384375]
We address the challenging problem of image captioning by revisiting the representation of image scene graph.
At the core of our method lies the decomposition of a scene graph into a set of sub-graphs.
We design a deep model to select important sub-graphs, and to decode each selected sub-graph into a single target sentence.
arXiv Detail & Related papers (2020-07-23T00:59:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.