VScript: Controllable Script Generation with Audio-Visual Presentation
- URL: http://arxiv.org/abs/2203.00314v1
- Date: Tue, 1 Mar 2022 09:43:02 GMT
- Title: VScript: Controllable Script Generation with Audio-Visual Presentation
- Authors: Ziwei Ji, Yan Xu, I-Tsun Cheng, Samuel Cahyawijaya, Rita Frieske,
Etsuko Ishii, Min Zeng, Andrea Madotto, Pascale Fung
- Abstract summary: VScript is a controllable pipeline that generates complete scripts including dialogues and scene descriptions.
We adopt a hierarchical structure, which generates the plot, then the script and its audio-visual presentation.
Experiment results show that our approach outperforms the baselines on both automatic and human evaluations.
- Score: 56.17400243061659
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automatic script generation could save a considerable amount of resources and
offer inspiration to professional scriptwriters. We present VScript, a
controllable pipeline that generates complete scripts including dialogues and
scene descriptions, and presents visually using video retrieval and aurally
using text-to-speech for spoken dialogue. With an interactive interface, our
system allows users to select genres and input starting words that control the
theme and development of the generated script. We adopt a hierarchical
structure, which generates the plot, then the script and its audio-visual
presentation. We also introduce a novel approach to plot-guided dialogue
generation by treating it as an inverse dialogue summarization. Experiment
results show that our approach outperforms the baselines on both automatic and
human evaluations, especially in terms of genre control.
Related papers
- ScreenWriter: Automatic Screenplay Generation and Movie Summarisation [55.20132267309382]
Video content has driven demand for textual descriptions or summaries that allow users to recall key plot points or get an overview without watching.
We propose the task of automatic screenplay generation, and a method, ScreenWriter, that operates only on video and produces output which includes dialogue, speaker names, scene breaks, and visual descriptions.
ScreenWriter introduces a novel algorithm to segment the video into scenes based on the sequence of visual vectors, and a novel method for the challenging problem of determining character names, based on a database of actors' faces.
arXiv Detail & Related papers (2024-10-17T07:59:54Z) - Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation [55.043492250775294]
We introduce a novel Face-to-Face spoken dialogue model.
It processes audio-visual speech from user input and generates audio-visual speech as the response.
We also introduce MultiDialog, the first large-scale multimodal spoken dialogue corpus.
arXiv Detail & Related papers (2024-06-12T04:48:36Z) - Contextual Dynamic Prompting for Response Generation in Task-oriented
Dialog Systems [8.419582942080927]
Response generation is one of the critical components in task-oriented dialog systems.
We propose an approach that performs textit dynamic prompting where the prompts are learnt from dialog contexts.
We show that contextual dynamic prompts improve response generation in terms of textit combined score citemehri-etal 2019-structured by 3 absolute points.
arXiv Detail & Related papers (2023-01-30T20:26:02Z) - A Benchmark for Understanding and Generating Dialogue between Characters
in Stories [75.29466820496913]
We present the first study to explore whether machines can understand and generate dialogue in stories.
We propose two new tasks including Masked Dialogue Generation and Dialogue Speaker Recognition.
We show the difficulty of the proposed tasks by testing existing models with automatic and manual evaluation on DialStory.
arXiv Detail & Related papers (2022-09-18T10:19:04Z) - DialogueScript: Using Dialogue Agents to Produce a Script [2.897111293806727]
We present a novel approach to generating scripts by using agents with different personality types.
We employ simulated dramatic networks to manage character interaction in the script.
arXiv Detail & Related papers (2022-06-16T19:57:01Z) - Controlled Cue Generation for Play Scripts [0.02578242050187029]
We use a large-scale play scripts dataset to propose the novel task of theatrical cue generation from dialogues.
We show how cues can be used to enhance the impact of dialogue using a language model conditioned on a dialogue/cue discriminator.
arXiv Detail & Related papers (2021-12-13T19:00:17Z) - Conversation Learner -- A Machine Teaching Tool for Building Dialog
Managers for Task-Oriented Dialog Systems [57.082447660944965]
Conversation Learner is a machine teaching tool for building dialog managers.
It enables dialog authors to create a dialog flow using familiar tools, converting the dialog flow into a parametric model.
It allows dialog authors to improve the dialog manager over time by leveraging user-system dialog logs as training data.
arXiv Detail & Related papers (2020-04-09T00:10:54Z) - Multimodal Transformer with Pointer Network for the DSTC8 AVSD Challenge [48.905496060794114]
We describe our submission to the AVSD track of the 8th Dialogue System Technology Challenge.
We adopt dot-product attention to combine text and non-text features of input video.
Our systems achieve high performance in automatic metrics and obtain 5th and 6th place in human evaluation.
arXiv Detail & Related papers (2020-02-25T06:41:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.