Non-Autoregressive Image Captioning with Counterfactuals-Critical
Multi-Agent Learning
- URL: http://arxiv.org/abs/2005.04690v1
- Date: Sun, 10 May 2020 15:09:44 GMT
- Title: Non-Autoregressive Image Captioning with Counterfactuals-Critical
Multi-Agent Learning
- Authors: Longteng Guo, Jing Liu, Xinxin Zhu, Xingjian He, Jie Jiang, Hanqing Lu
- Abstract summary: We propose a Non-Autoregressive Image Captioning model with a novel training paradigm: Counterfactuals-critical Multi-Agent Learning (CMAL)
Our NAIC model achieves a performance comparable to state-of-the-art autoregressive models, while brings 13.9x decoding speedup.
- Score: 46.060954649681385
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most image captioning models are autoregressive, i.e. they generate each word
by conditioning on previously generated words, which leads to heavy latency
during inference. Recently, non-autoregressive decoding has been proposed in
machine translation to speed up the inference time by generating all words in
parallel. Typically, these models use the word-level cross-entropy loss to
optimize each word independently. However, such a learning process fails to
consider the sentence-level consistency, thus resulting in inferior generation
quality of these non-autoregressive models. In this paper, we propose a
Non-Autoregressive Image Captioning (NAIC) model with a novel training
paradigm: Counterfactuals-critical Multi-Agent Learning (CMAL). CMAL formulates
NAIC as a multi-agent reinforcement learning system where positions in the
target sequence are viewed as agents that learn to cooperatively maximize a
sentence-level reward. Besides, we propose to utilize massive unlabeled images
to boost captioning performance. Extensive experiments on MSCOCO image
captioning benchmark show that our NAIC model achieves a performance comparable
to state-of-the-art autoregressive models, while brings 13.9x decoding speedup.
Related papers
- Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis [62.06970466554273]
We present Meissonic, which non-autoregressive masked image modeling (MIM) text-to-image elevates to a level comparable with state-of-the-art diffusion models like SDXL.
We leverage high-quality training data, integrate micro-conditions informed by human preference scores, and employ feature compression layers to further enhance image fidelity and resolution.
Our model not only matches but often exceeds the performance of existing models like SDXL in generating high-quality, high-resolution images.
arXiv Detail & Related papers (2024-10-10T17:59:17Z) - Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding [60.188309982690335]
We propose a training-free probabilistic parallel decoding algorithm, Speculative Jacobi Decoding (SJD), to accelerate auto-regressive text-to-image generation.
By introducing a probabilistic convergence criterion, our SJD accelerates the inference of auto-regressive text-to-image generation while maintaining the randomness in sampling-based token decoding.
arXiv Detail & Related papers (2024-10-02T16:05:27Z) - Efficient Modeling of Future Context for Image Captioning [38.52032153180971]
Non-Autoregressive Image Captioning ( NAIC) can leverage two-side relation with modified mask operation.
Our proposed approach clearly surpass the state-of-the-art baselines in both automatic metrics and human evaluations.
arXiv Detail & Related papers (2022-07-22T06:21:43Z) - Prompt-based Learning for Unpaired Image Captioning [86.44188293709307]
Unpaired Image Captioning (UIC) has been developed to learn image descriptions from unaligned vision-language sample pairs.
Recent successes of Vision-Language Pre-Trained Models (VL-PTMs) have triggered the development of prompt-based learning.
We present in this paper a novel scheme based on prompt to train the UIC model, making best use of the powerful generalization ability.
arXiv Detail & Related papers (2022-05-26T03:13:43Z) - Semi-Autoregressive Image Captioning [153.9658053662605]
Current state-of-the-art approaches for image captioning typically adopt an autoregressive manner.
Non-autoregressive image captioning with continuous iterative refinement can achieve comparable performance to the autoregressive counterparts with a considerable acceleration.
We propose a novel two-stage framework, referred to as Semi-Autoregressive Image Captioning (SAIC) to make a better trade-off between performance and speed.
arXiv Detail & Related papers (2021-10-11T15:11:54Z) - Semi-Autoregressive Transformer for Image Captioning [17.533503295862808]
We introduce a semi-autoregressive model for image captioning(dubbed as SATIC)
It keeps the autoregressive property in global but generates words parallelly in local.
Experiments on the MSCOCO image captioning benchmark show that SATIC can achieve a better trade-off without bells and whistles.
arXiv Detail & Related papers (2021-06-17T12:36:33Z) - Fast Sequence Generation with Multi-Agent Reinforcement Learning [40.75211414663022]
Non-autoregressive decoding has been proposed in machine translation to speed up the inference time by generating all words in parallel.
We propose a simple and efficient model for Non-Autoregressive sequence Generation (NAG) with a novel training paradigm: Counterfactuals-critical Multi-Agent Learning (CMAL)
On MSCOCO image captioning benchmark, our NAG method achieves a performance comparable to state-of-the-art autoregressive models, while brings 13.9x decoding speedup.
arXiv Detail & Related papers (2021-01-24T12:16:45Z) - Length-Controllable Image Captioning [67.2079793803317]
We propose to use a simple length level embedding to endow them with this ability.
Due to their autoregressive nature, the computational complexity of existing models increases linearly as the length of the generated captions grows.
We further devise a non-autoregressive image captioning approach that can generate captions in a length-irrelevant complexity.
arXiv Detail & Related papers (2020-07-19T03:40:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.