Related papers: VScript: Controllable Script Generation with Audio-Visual Presentation

VScript: Controllable Script Generation with Audio-Visual Presentation

URL: http://arxiv.org/abs/2203.00314v1
Date: Tue, 1 Mar 2022 09:43:02 GMT
Title: VScript: Controllable Script Generation with Audio-Visual Presentation
Authors: Ziwei Ji, Yan Xu, I-Tsun Cheng, Samuel Cahyawijaya, Rita Frieske, Etsuko Ishii, Min Zeng, Andrea Madotto, Pascale Fung
Abstract summary: VScript is a controllable pipeline that generates complete scripts including dialogues and scene descriptions. We adopt a hierarchical structure, which generates the plot, then the script and its audio-visual presentation. Experiment results show that our approach outperforms the baselines on both automatic and human evaluations.
Score: 56.17400243061659
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Automatic script generation could save a considerable amount of resources and offer inspiration to professional scriptwriters. We present VScript, a controllable pipeline that generates complete scripts including dialogues and scene descriptions, and presents visually using video retrieval and aurally using text-to-speech for spoken dialogue. With an interactive interface, our system allows users to select genres and input starting words that control the theme and development of the generated script. We adopt a hierarchical structure, which generates the plot, then the script and its audio-visual presentation. We also introduce a novel approach to plot-guided dialogue generation by treating it as an inverse dialogue summarization. Experiment results show that our approach outperforms the baselines on both automatic and human evaluations, especially in terms of genre control.

Related papers

The Script is All You Need: An Agentic Framework for Long-Horizon Dialogue-to-Cinematic Video Generation [95.18045807704284]
We introduce an end-to-end agentic framework for dialogue-to-cinematic-video generation.<n> ScripterAgent is trained to translate coarse dialogue into a fine-grained, executable cinematic script.<n>Our framework significantly improves script faithfulness and temporal fidelity across all tested video models.
arXiv Detail & Related papers (2026-01-25T08:10:28Z)
Linear Script Representations in Speech Foundation Models Enable Zero-Shot Transliteration [70.84108518476744]
We show that script is linearly encoded in the activation space of multilingual speech models, and that modifying activations at inference time enables direct control over output script.<n>We apply this approach to inducing post-hoc control over the script of speech recognition output, where we observe competitive performance across all model sizes of Whisper.
arXiv Detail & Related papers (2026-01-06T10:45:04Z)
ScreenWriter: Automatic Screenplay Generation and Movie Summarisation [55.20132267309382]
Video content has driven demand for textual descriptions or summaries that allow users to recall key plot points or get an overview without watching. We propose the task of automatic screenplay generation, and a method, ScreenWriter, that operates only on video and produces output which includes dialogue, speaker names, scene breaks, and visual descriptions. ScreenWriter introduces a novel algorithm to segment the video into scenes based on the sequence of visual vectors, and a novel method for the challenging problem of determining character names, based on a database of actors' faces.
arXiv Detail & Related papers (2024-10-17T07:59:54Z)
Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation [55.043492250775294]
We introduce a novel Face-to-Face spoken dialogue model. It processes audio-visual speech from user input and generates audio-visual speech as the response. We also introduce MultiDialog, the first large-scale multimodal spoken dialogue corpus.
arXiv Detail & Related papers (2024-06-12T04:48:36Z)
Contextual Dynamic Prompting for Response Generation in Task-oriented Dialog Systems [8.419582942080927]
Response generation is one of the critical components in task-oriented dialog systems. We propose an approach that performs textit dynamic prompting where the prompts are learnt from dialog contexts. We show that contextual dynamic prompts improve response generation in terms of textit combined score citemehri-etal 2019-structured by 3 absolute points.
arXiv Detail & Related papers (2023-01-30T20:26:02Z)
A Benchmark for Understanding and Generating Dialogue between Characters in Stories [75.29466820496913]
We present the first study to explore whether machines can understand and generate dialogue in stories. We propose two new tasks including Masked Dialogue Generation and Dialogue Speaker Recognition. We show the difficulty of the proposed tasks by testing existing models with automatic and manual evaluation on DialStory.
arXiv Detail & Related papers (2022-09-18T10:19:04Z)
DialogueScript: Using Dialogue Agents to Produce a Script [2.897111293806727]
We present a novel approach to generating scripts by using agents with different personality types. We employ simulated dramatic networks to manage character interaction in the script.
arXiv Detail & Related papers (2022-06-16T19:57:01Z)
Controlled Cue Generation for Play Scripts [0.02578242050187029]
We use a large-scale play scripts dataset to propose the novel task of theatrical cue generation from dialogues. We show how cues can be used to enhance the impact of dialogue using a language model conditioned on a dialogue/cue discriminator.
arXiv Detail & Related papers (2021-12-13T19:00:17Z)
Conversation Learner -- A Machine Teaching Tool for Building Dialog Managers for Task-Oriented Dialog Systems [57.082447660944965]
Conversation Learner is a machine teaching tool for building dialog managers. It enables dialog authors to create a dialog flow using familiar tools, converting the dialog flow into a parametric model. It allows dialog authors to improve the dialog manager over time by leveraging user-system dialog logs as training data.
arXiv Detail & Related papers (2020-04-09T00:10:54Z)
Multimodal Transformer with Pointer Network for the DSTC8 AVSD Challenge [48.905496060794114]
We describe our submission to the AVSD track of the 8th Dialogue System Technology Challenge. We adopt dot-product attention to combine text and non-text features of input video. Our systems achieve high performance in automatic metrics and obtain 5th and 6th place in human evaluation.
arXiv Detail & Related papers (2020-02-25T06:41:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.