ImaginE: An Imagination-Based Automatic Evaluation Metric for Natural
Language Generation
- URL: http://arxiv.org/abs/2106.05970v1
- Date: Thu, 10 Jun 2021 17:59:52 GMT
- Title: ImaginE: An Imagination-Based Automatic Evaluation Metric for Natural
Language Generation
- Authors: Wanrong Zhu, Xin Eric Wang, An Yan, Miguel Eckstein, William Yang Wang
- Abstract summary: We propose ImaginE, an imagination-based automatic evaluation metric for natural language generation.
With the help of CLIP and DALL-E, two cross-modal models pre-trained on large-scale image-text pairs, we automatically generate an image as the embodied imagination for the text snippet.
Experiments spanning several text generation tasks demonstrate that adding imagination with our ImaginE displays great potential in introducing multi-modal information into NLG evaluation.
- Score: 53.56628907030751
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic evaluations for natural language generation (NLG) conventionally
rely on token-level or embedding-level comparisons with the text references.
This is different from human language processing, for which visual imaginations
often improve comprehension. In this work, we propose ImaginE, an
imagination-based automatic evaluation metric for natural language generation.
With the help of CLIP and DALL-E, two cross-modal models pre-trained on
large-scale image-text pairs, we automatically generate an image as the
embodied imagination for the text snippet and compute the imagination
similarity using contextual embeddings. Experiments spanning several text
generation tasks demonstrate that adding imagination with our ImaginE displays
great potential in introducing multi-modal information into NLG evaluation, and
improves existing automatic metrics' correlations with human similarity
judgments in many circumstances.
Related papers
- ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling [35.098725056881655]
Recent large vision language models (LVLMs) have shown unprecedented visual reasoning capabilities.
The generated text often suffers from inaccurate grounding in the visual input, resulting in errors such as hallucination of nonexistent scene elements.
We introduce a novel framework, ViGoR, that utilizes fine-grained reward modeling to significantly enhance the visual grounding of LVLMs over pre-trained baselines.
arXiv Detail & Related papers (2024-02-09T01:00:14Z) - Learning to Imagine: Visually-Augmented Natural Language Generation [73.65760028876943]
We propose a method to make pre-trained language models (PLMs) Learn to Imagine for Visuallyaugmented natural language gEneration.
We use a diffusion model to synthesize high-quality images conditioned on the input texts.
We conduct synthesis for each sentence rather than generate only one image for an entire paragraph.
arXiv Detail & Related papers (2023-05-26T13:59:45Z) - Learning the Visualness of Text Using Large Vision-Language Models [42.75864384249245]
Visual text evokes an image in a person's mind, while non-visual text fails to do so.
A method to automatically detect visualness in text will enable text-to-image retrieval and generation models to augment text with relevant images.
We curate a dataset of 3,620 English sentences and their visualness scores provided by multiple human annotators.
arXiv Detail & Related papers (2023-05-11T17:45:16Z) - ImaginaryNet: Learning Object Detectors without Real Images and
Annotations [66.30908705345973]
We propose a framework to synthesize images by combining pretrained language model and text-to-image model.
With the synthesized images and class labels, weakly supervised object detection can then be leveraged to accomplish Imaginary-Supervised Object Detection.
Experiments show that ImaginaryNet can (i) obtain about 70% performance in ISOD compared with the weakly supervised counterpart of the same backbone trained on real data.
arXiv Detail & Related papers (2022-10-13T10:25:22Z) - Visualize Before You Write: Imagination-Guided Open-Ended Text
Generation [68.96699389728964]
We propose iNLG that uses machine-generated images to guide language models in open-ended text generation.
Experiments and analyses demonstrate the effectiveness of iNLG on open-ended text generation tasks.
arXiv Detail & Related papers (2022-10-07T18:01:09Z) - Visually-Augmented Language Modeling [137.36789885105642]
We propose a novel pre-training framework, named VaLM, to Visually-augment text tokens with retrieved relevant images for Language Modeling.
With the visually-augmented context, VaLM uses a visual knowledge fusion layer to enable multimodal grounded language modeling.
We evaluate the proposed model on various multimodal commonsense reasoning tasks, which require visual information to excel.
arXiv Detail & Related papers (2022-05-20T13:41:12Z) - Improving Generation and Evaluation of Visual Stories via Semantic
Consistency [72.00815192668193]
Given a series of natural language captions, an agent must generate a sequence of images that correspond to the captions.
Prior work has introduced recurrent generative models which outperform synthesis text-to-image models on this task.
We present a number of improvements to prior modeling approaches, including the addition of a dual learning framework.
arXiv Detail & Related papers (2021-05-20T20:42:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.