Related papers: PASS: Presentation Automation for Slide Generation and Speech

PASS: Presentation Automation for Slide Generation and Speech

URL: http://arxiv.org/abs/2501.06497v2
Date: Wed, 15 Jan 2025 20:43:44 GMT
Title: PASS: Presentation Automation for Slide Generation and Speech
Authors: Tushar Aggarwal, Aarohi Bhand,
Abstract summary: PASS is a pipeline used to generate slides from general Word documents.<n>It also automates the oral delivery of the generated slides.<n>Pass analyzes user documents to create a dynamic, engaging presentation with an AI-generated voice.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In today's fast-paced world, effective presentations have become an essential tool for communication in both online and offline meetings. The crafting of a compelling presentation requires significant time and effort, from gathering key insights to designing slides that convey information clearly and concisely. However, despite the wealth of resources available, people often find themselves manually extracting crucial points, analyzing data, and organizing content in a way that ensures clarity and impact. Furthermore, a successful presentation goes beyond just the slides; it demands rehearsal and the ability to weave a captivating narrative to fully engage the audience. Although there has been some exploration of automating document-to-slide generation, existing research is largely centered on converting research papers. In addition, automation of the delivery of these presentations has yet to be addressed. We introduce PASS, a pipeline used to generate slides from general Word documents, going beyond just research papers, which also automates the oral delivery of the generated slides. PASS analyzes user documents to create a dynamic, engaging presentation with an AI-generated voice. Additionally, we developed an LLM-based evaluation metric to assess our pipeline across three critical dimensions of presentations: relevance, coherence, and redundancy. The data and codes are available at https://github.com/AggarwalTushar/PASS.

Related papers

Paper2Video: Automatic Video Generation from Scientific Papers [62.634562246594555]
Paper2Video is the first benchmark of 101 research papers paired with author-created presentation videos, slides, and speaker metadata.<n>We propose PaperTalker, the first multi-agent framework for academic presentation video generation.
arXiv Detail & Related papers (2025-10-06T17:58:02Z)
Generating Narrated Lecture Videos from Slides with Synchronized Highlights [55.2480439325792]
We introduce an end-to-end system designed to automate the process of turning static slides into video lectures.<n>This system synthesizes a video lecture featuring AI-generated narration precisely synchronized with dynamic visual highlights.<n>We demonstrate the system's effectiveness through a technical evaluation using a manually annotated slide dataset with 1000 samples.
arXiv Detail & Related papers (2025-05-05T18:51:53Z)
Generative Compositor for Few-Shot Visual Information Extraction [60.663887314625164]
We propose a novel generative model, named Generative generative spatialtor, to address the challenge of few-shot VIE. Generative generative spatialtor is a hybrid pointer-generator network that emulates the operations of a compositor by retrieving words from the source text. The proposed method achieves highly competitive results in the full-sample training, while notably outperforms the baseline in the 1-shot, 5-shot, and 10-shot settings.
arXiv Detail & Related papers (2025-03-21T04:56:24Z)
PPTAgent: Generating and Evaluating Presentations Beyond Text-to-Slides [53.17641835701013]
We propose a two-stage, edit-based approach to automatically generating presentations.<n>PPTAgent first analyzes presentations to understand their structural patterns and content schemas, then drafts outlines and generates slides.<n>Experiments show that PPTAgent significantly outperforms traditional automatic presentation generation methods across all three dimensions.
arXiv Detail & Related papers (2025-01-07T16:53:01Z)
Enhancing Presentation Slide Generation by LLMs with a Multi-Staged End-to-End Approach [21.8104104944488]
Existing approaches for generating a rich presentation from a document are often semi-automatic or only put a flat summary into the slides ignoring the importance of a good narrative. We propose a multi-staged end-to-end model which uses a combination of LLM and VLM. We have experimentally shown that compared to applying LLMs directly with state-of-the-art prompting, our proposed multi-staged solution is better in terms of automated metrics and human evaluation.
arXiv Detail & Related papers (2024-06-01T07:49:31Z)
Presentations are not always linear! GNN meets LLM for Document-to-Presentation Transformation with Attribution [21.473482276335194]
It is difficult to incorporate such non-linear mapping of content to slides and ensure that the content is faithful to the document. We propose a novel graph based solution where we learn a graph from the input document and use a combination of graph neural network and LLM to generate a presentation.
arXiv Detail & Related papers (2024-05-21T13:52:33Z)
Using Large Language Models to Generate Engaging Captions for Data Visualizations [51.98253121636079]
Large language models (LLM) use sophisticated deep learning technology to produce human-like prose. Key challenge lies in designing the most effective prompt for the LLM, a task called prompt engineering. We report on first experiments using the popular LLM GPT-3 and deliver some promising results.
arXiv Detail & Related papers (2022-12-27T23:56:57Z)
NECE: Narrative Event Chain Extraction Toolkit [64.89332212585404]
We introduce NECE, an open-access, document-level toolkit that automatically extracts and aligns narrative events in the temporal order of their occurrence. We show the high quality of the NECE toolkit and demonstrate its downstream application in analyzing narrative bias regarding gender. We also openly discuss the shortcomings of the current approach, and potential of leveraging generative models in future works.
arXiv Detail & Related papers (2022-08-17T04:30:58Z)
Unified Pretraining Framework for Document Understanding [52.224359498792836]
We present UDoc, a new unified pretraining framework for document understanding. UDoc is designed to support most document understanding tasks, extending the Transformer to take multimodal embeddings as input. An important feature of UDoc is that it learns a generic representation by making use of three self-supervised losses.
arXiv Detail & Related papers (2022-04-22T21:47:04Z)
AI based Presentation Creator With Customized Audio Content Delivery [0.0]
This paper aims to use Machine Learning (ML) algorithms and Natural Language Processing (NLP) modules to automate the process of creating a slides-based presentation from a document. We then use state-of-the-art voice cloning models to deliver the content in the desired author's voice.
arXiv Detail & Related papers (2021-06-27T12:17:11Z)
D2S: Document-to-Slide Generation Via Query-Based Text Summarization [27.576875048631265]
We contribute a new dataset, SciDuet, consisting of pairs of papers and their corresponding slides decks from recent years' NLP and ML conferences. Secondly, we present D2S, a novel system that tackles the document-to-slides task with a two-step approach. Our evaluation suggests that long-form QA outperforms state-of-the-art summarization baselines on both automated ROUGE metrics and qualitative human evaluation.
arXiv Detail & Related papers (2021-05-08T10:29:41Z)
DOC2PPT: Automatic Presentation Slides Generation from Scientific Documents [76.19748112897177]
We present a novel task and approach for document-to-slide generation. We propose a hierarchical sequence-to-sequence approach to tackle our task in an end-to-end manner. Our approach exploits the inherent structures within documents and slides and incorporates paraphrasing and layout prediction modules to generate slides.
arXiv Detail & Related papers (2021-01-28T03:21:17Z)
Learning to Emphasize: Dataset and Shared Task Models for Selecting Emphasis in Presentation Slides [31.540208729354354]
Emphasizing strong leading words in presentation slides can allow the audience to direct the eye to certain focal points instead of reading the entire slide. Motivated by this demand, we study the problem of Emphasis Selection (ES) in presentation slides. We introduce a new dataset containing presentation slides with a wide variety of topics, each is annotated with emphasis words in a crowdsourced setting.
arXiv Detail & Related papers (2021-01-02T06:54:55Z)
From Standard Summarization to New Tasks and Beyond: Summarization with Manifold Information [77.89755281215079]
Text summarization is the research area aiming at creating a short and condensed version of the original document. In real-world applications, most of the data is not in a plain text format. This paper focuses on the survey of these new summarization tasks and approaches in the real-world application.
arXiv Detail & Related papers (2020-05-10T14:59:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.