NICE: CVPR 2023 Challenge on Zero-shot Image Captioning
- URL: http://arxiv.org/abs/2309.01961v3
- Date: Mon, 11 Sep 2023 02:15:30 GMT
- Title: NICE: CVPR 2023 Challenge on Zero-shot Image Captioning
- Authors: Taehoon Kim, Pyunghwan Ahn, Sangyun Kim, Sihaeng Lee, Mark Marsden,
Alessandra Sala, Seung Hwan Kim, Bohyung Han, Kyoung Mu Lee, Honglak Lee,
Kyounghoon Bae, Xiangyu Wu, Yi Gao, Hailiang Zhang, Yang Yang, Weili Guo,
Jianfeng Lu, Youngtaek Oh, Jae Won Cho, Dong-jin Kim, In So Kweon, Junmo Kim,
Wooyoung Kang, Won Young Jhoo, Byungseok Roh, Jonghwan Mun, Solgil Oh, Kenan
Emir Ak, Gwang-Gook Lee, Yan Xu, Mingwei Shen, Kyomin Hwang, Wonsik Shin,
Kamin Lee, Wonhark Park, Dongkwan Lee, Nojun Kwak, Yujin Wang, Yimu Wang,
Tiancheng Gu, Xingchang Lv, Mingmao Sun
- Abstract summary: NICE project is designed to challenge the computer vision community to develop robust image captioning models.
Report includes information on the newly proposed NICE dataset, evaluation methods, challenge results, and technical details of top-ranking entries.
- Score: 149.28330263581012
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this report, we introduce NICE (New frontiers for zero-shot Image
Captioning Evaluation) project and share the results and outcomes of 2023
challenge. This project is designed to challenge the computer vision community
to develop robust image captioning models that advance the state-of-the-art
both in terms of accuracy and fairness. Through the challenge, the image
captioning models were tested using a new evaluation dataset that includes a
large variety of visual concepts from many domains. There was no specific
training data provided for the challenge, and therefore the challenge entries
were required to adapt to new types of image descriptions that had not been
seen during training. This report includes information on the newly proposed
NICE dataset, evaluation methods, challenge results, and technical details of
top-ranking entries. We expect that the outcomes of the challenge will
contribute to the improvement of AI models on various vision-language tasks.
Related papers
- NTIRE 2024 Quality Assessment of AI-Generated Content Challenge [141.37864527005226]
The challenge is divided into the image track and the video track.
The winning methods in both tracks have demonstrated superior prediction performance on AIGC.
arXiv Detail & Related papers (2024-04-25T15:36:18Z) - NTIRE 2024 Challenge on Image Super-Resolution ($\times$4): Methods and Results [126.78130602974319]
This paper reviews the NTIRE 2024 challenge on image super-resolution ($times$4)
The challenge involves generating corresponding high-resolution (HR) images, magnified by a factor of four, from low-resolution (LR) inputs.
The aim of the challenge is to obtain designs/solutions with the most advanced SR performance.
arXiv Detail & Related papers (2024-04-15T13:45:48Z) - Out-of-Vocabulary Challenge Report [15.827931962904115]
The Out-Of-Vocabulary 2022 (OOV) challenge introduces the recognition of unseen scene text instances at training time.
The competition compiles a collection of public scene text datasets comprising of 326,385 images with 4,864,405 scene text instances.
A thorough analysis of results from baselines and different participants is presented.
arXiv Detail & Related papers (2022-09-14T15:25:54Z) - NTIRE 2022 Challenge on Perceptual Image Quality Assessment [90.04931572825859]
This paper reports on the NTIRE 2022 challenge on perceptual image quality assessment (IQA)
The challenge is held to address the emerging challenge of IQA by perceptual image processing algorithms.
The winning method can demonstrate state-of-the-art performance.
arXiv Detail & Related papers (2022-06-23T13:36:49Z) - ChaLearn Looking at People: Inpainting and Denoising challenges [41.481257371694284]
This chapter describes the design of an academic competition focusing on inpainting of images and video sequences.
The ChaLearn Looking at People Inpainting Challenge aimed at advancing the state of the art on visual inpainting.
Three tracks were proposed in which visual inpainting might be helpful but still challenging: human body pose estimation, text overlays removal and fingerprint denoising.
arXiv Detail & Related papers (2021-06-24T14:57:21Z) - Learning to Select: A Fully Attentive Approach for Novel Object
Captioning [48.497478154384105]
Novel object captioning (NOC) has recently emerged as a paradigm to test captioning models on objects which are unseen during the training phase.
We present a novel approach for NOC that learns to select the most relevant objects of an image, regardless of their adherence to the training set.
Our architecture is fully-attentive and end-to-end trainable, also when incorporating constraints.
arXiv Detail & Related papers (2021-06-02T19:11:21Z) - NTIRE 2021 Challenge on Perceptual Image Quality Assessment [128.83256694901726]
This paper reports on the NTIRE 2021 challenge on perceptual image quality assessment (IQA)
It was held in conjunction with the New Trends in Image Restoration and Enhancement workshop (NTIRE) at CVPR 2021.
As a new type of image processing technology, perceptual image processing algorithms based on Generative Adversarial Networks (GAN) have produced images with more realistic textures.
arXiv Detail & Related papers (2021-05-07T05:36:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.