Related papers: The Art of Storytelling: Multi-Agent Generative AI for Dynamic Multimodal Narratives

The Art of Storytelling: Multi-Agent Generative AI for Dynamic Multimodal Narratives

URL: http://arxiv.org/abs/2409.11261v3
Date: Thu, 19 Sep 2024 09:50:58 GMT
Title: The Art of Storytelling: Multi-Agent Generative AI for Dynamic Multimodal Narratives
Authors: Samee Arif, Taimoor Arif, Muhammad Saad Haroon, Aamina Jamal Khan, Agha Ali Raza, Awais Athar,
Abstract summary: This paper introduces the concept of an education tool that utilizes Generative Artificial Intelligence (GenAI) to enhance storytelling for children. The system combines GenAI-driven narrative co-creation, text-to-speech conversion, and text-to-video generation to produce an engaging experience for learners.
Score: 3.5001789247699535
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper introduces the concept of an education tool that utilizes Generative Artificial Intelligence (GenAI) to enhance storytelling for children. The system combines GenAI-driven narrative co-creation, text-to-speech conversion, and text-to-video generation to produce an engaging experience for learners. We describe the co-creation process, the adaptation of narratives into spoken words using text-to-speech models, and the transformation of these narratives into contextually relevant visuals through text-to-video technology. Our evaluation covers the linguistics of the generated stories, the text-to-speech conversion quality, and the accuracy of the generated visuals.

Related papers

MoCha: Towards Movie-Grade Talking Character Synthesis [62.007000023747445]
We introduce Talking Characters, a more realistic task to generate talking character animations directly from speech and text. Unlike talking head, Talking Characters aims at generating the full portrait of one or more characters beyond the facial region. We propose MoCha, the first of its kind to generate talking characters.
arXiv Detail & Related papers (2025-03-30T04:22:09Z)
From Words to Worlds: Transforming One-line Prompt into Immersive Multi-modal Digital Stories with Communicative LLM Agent [11.553884271082127]
This paper introduces the StoryAgent framework to automate and refine digital storytelling. StoryAgent tackles key issues such as manual intervention, interactive scene orchestration, and narrative consistency. Results demonstrate the framework's capability to produce coherent digital stories without reference videos.
arXiv Detail & Related papers (2024-06-15T03:03:43Z)
Improved Child Text-to-Speech Synthesis through Fastpitch-based Transfer Learning [3.5032870024762386]
This paper presents a novel approach that leverages the Fastpitch text-to-speech (TTS) model for generating high-quality synthetic child speech. The approach involved finetuning a multi-speaker TTS model to work with child speech. We conducted an objective assessment that showed a significant correlation between real and synthetic child voices.
arXiv Detail & Related papers (2023-11-07T19:31:44Z)
Text-Only Training for Visual Storytelling [107.19873669536523]
We formulate visual storytelling as a visual-conditioned story generation problem. We propose a text-only training method that separates the learning of cross-modality alignment and story generation.
arXiv Detail & Related papers (2023-08-17T09:32:17Z)
Channel-aware Decoupling Network for Multi-turn Dialogue Comprehension [81.47133615169203]
We propose compositional learning for holistic interaction across utterances beyond the sequential contextualization from PrLMs. We employ domain-adaptive training strategies to help the model adapt to the dialogue domains. Experimental results show that our method substantially boosts the strong PrLM baselines in four public benchmark datasets.
arXiv Detail & Related papers (2023-01-10T13:18:25Z)
Visualize Before You Write: Imagination-Guided Open-Ended Text Generation [68.96699389728964]
We propose iNLG that uses machine-generated images to guide language models in open-ended text generation. Experiments and analyses demonstrate the effectiveness of iNLG on open-ended text generation tasks.
arXiv Detail & Related papers (2022-10-07T18:01:09Z)
A Benchmark for Understanding and Generating Dialogue between Characters in Stories [75.29466820496913]
We present the first study to explore whether machines can understand and generate dialogue in stories. We propose two new tasks including Masked Dialogue Generation and Dialogue Speaker Recognition. We show the difficulty of the proposed tasks by testing existing models with automatic and manual evaluation on DialStory.
arXiv Detail & Related papers (2022-09-18T10:19:04Z)
ViNTER: Image Narrative Generation with Emotion-Arc-Aware Transformer [59.05857591535986]
We propose a model called ViNTER to generate image narratives that focus on time series representing varying emotions as "emotion arcs" We present experimental results of both manual and automatic evaluations.
arXiv Detail & Related papers (2022-02-15T10:53:08Z)
FairyTailor: A Multimodal Generative Framework for Storytelling [33.39639788612019]
We introduce a system and a demo, FairyTailor, for human-in-the-loop visual story co-creation. Users can create a cohesive children's fairytale by weaving generated texts and retrieved images with their input. To our knowledge, this is the first dynamic tool for multimodal story generation that allows interactive co-formation of both texts and images.
arXiv Detail & Related papers (2021-07-13T02:45:08Z)
Improving Generation and Evaluation of Visual Stories via Semantic Consistency [72.00815192668193]
Given a series of natural language captions, an agent must generate a sequence of images that correspond to the captions. Prior work has introduced recurrent generative models which outperform synthesis text-to-image models on this task. We present a number of improvements to prior modeling approaches, including the addition of a dual learning framework.
arXiv Detail & Related papers (2021-05-20T20:42:42Z)
Text2Video: Text-driven Talking-head Video Synthesis with Phonetic Dictionary [10.590649169151055]
We present a novel approach to synthesize video from the text. The method builds a phoneme-pose dictionary and trains a generative adversarial network (GAN) to generate video. Compared to audio-driven video generation algorithms, our approach has a number of advantages.
arXiv Detail & Related papers (2021-04-29T19:54:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.