Visualize Before You Write: Imagination-Guided Open-Ended Text
Generation
- URL: http://arxiv.org/abs/2210.03765v1
- Date: Fri, 7 Oct 2022 18:01:09 GMT
- Title: Visualize Before You Write: Imagination-Guided Open-Ended Text
Generation
- Authors: Wanrong Zhu, An Yan, Yujie Lu, Wenda Xu, Xin Eric Wang, Miguel
Eckstein, William Yang Wang
- Abstract summary: We propose iNLG that uses machine-generated images to guide language models in open-ended text generation.
Experiments and analyses demonstrate the effectiveness of iNLG on open-ended text generation tasks.
- Score: 68.96699389728964
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in text-to-image synthesis make it possible to visualize
machine imaginations for a given context. On the other hand, when generating
text, human writers are gifted at creative visualization, which enhances their
writings by forming imaginations as blueprints before putting down the stories
in words. Inspired by such a cognitive process, we ask the natural question of
whether we can endow machines with the same ability to utilize visual
information and construct a general picture of the context to guide text
generation. In this work, we propose iNLG that uses machine-generated images to
guide language models (LM) in open-ended text generation. The experiments and
analyses demonstrate the effectiveness of iNLG on open-ended text generation
tasks, including text completion, story generation, and concept-to-text
generation in few-shot scenarios. Both automatic metrics and human evaluations
verify that the text snippets generated by our iNLG are coherent and
informative while displaying minor degeneration.
Related papers
- The Art of Storytelling: Multi-Agent Generative AI for Dynamic Multimodal Narratives [3.5001789247699535]
This paper introduces the concept of an education tool that utilizes Generative Artificial Intelligence (GenAI) to enhance storytelling for children.
The system combines GenAI-driven narrative co-creation, text-to-speech conversion, and text-to-video generation to produce an engaging experience for learners.
arXiv Detail & Related papers (2024-09-17T15:10:23Z) - Text-Animator: Controllable Visual Text Video Generation [149.940821790235]
We propose an innovative approach termed Text-Animator for visual text video generation.
Text-Animator contains a text embedding injection module to precisely depict the structures of visual text in generated videos.
We also develop a camera control module and a text refinement module to improve the stability of generated visual text.
arXiv Detail & Related papers (2024-06-25T17:59:41Z) - The Lost Melody: Empirical Observations on Text-to-Video Generation From A Storytelling Perspective [4.471962177124311]
We examine text-to-video generation from a storytelling perspective, which has been hardly investigated.
We propose an evaluation framework for storytelling aspects of videos, and discuss the potential future directions.
arXiv Detail & Related papers (2024-05-13T02:25:08Z) - Intelligent Grimm -- Open-ended Visual Storytelling via Latent Diffusion
Models [70.86603627188519]
We focus on a novel, yet challenging task of generating a coherent image sequence based on a given storyline, denoted as open-ended visual storytelling.
We propose a learning-based auto-regressive image generation model, termed as StoryGen, with a novel vision-language context module.
We show StoryGen can generalize to unseen characters without any optimization, and generate image sequences with coherent content and consistent character.
arXiv Detail & Related papers (2023-06-01T17:58:50Z) - Learning to Imagine: Visually-Augmented Natural Language Generation [73.65760028876943]
We propose a method to make pre-trained language models (PLMs) Learn to Imagine for Visuallyaugmented natural language gEneration.
We use a diffusion model to synthesize high-quality images conditioned on the input texts.
We conduct synthesis for each sentence rather than generate only one image for an entire paragraph.
arXiv Detail & Related papers (2023-05-26T13:59:45Z) - Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors [58.71128866226768]
Recent text-to-image generation methods have incrementally improved the generated image fidelity and text relevancy.
We propose a novel text-to-image method that addresses these gaps by (i) enabling a simple control mechanism complementary to text in the form of a scene.
Our model achieves state-of-the-art FID and human evaluation results, unlocking the ability to generate high fidelity images in a resolution of 512x512 pixels.
arXiv Detail & Related papers (2022-03-24T15:44:50Z) - FairyTailor: A Multimodal Generative Framework for Storytelling [33.39639788612019]
We introduce a system and a demo, FairyTailor, for human-in-the-loop visual story co-creation.
Users can create a cohesive children's fairytale by weaving generated texts and retrieved images with their input.
To our knowledge, this is the first dynamic tool for multimodal story generation that allows interactive co-formation of both texts and images.
arXiv Detail & Related papers (2021-07-13T02:45:08Z) - ImaginE: An Imagination-Based Automatic Evaluation Metric for Natural
Language Generation [53.56628907030751]
We propose ImaginE, an imagination-based automatic evaluation metric for natural language generation.
With the help of CLIP and DALL-E, two cross-modal models pre-trained on large-scale image-text pairs, we automatically generate an image as the embodied imagination for the text snippet.
Experiments spanning several text generation tasks demonstrate that adding imagination with our ImaginE displays great potential in introducing multi-modal information into NLG evaluation.
arXiv Detail & Related papers (2021-06-10T17:59:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.