Talk to Your Slides: Language-Driven Agents for Efficient Slide Editing
- URL: http://arxiv.org/abs/2505.11604v3
- Date: Sun, 25 May 2025 15:05:53 GMT
- Title: Talk to Your Slides: Language-Driven Agents for Efficient Slide Editing
- Authors: Kyudan Jung, Hojun Cho, Jooyeol Yun, Soyoung Yang, Jaehyeok Jang, Jaegul Choo,
- Abstract summary: We propose Talk-to-Your-Slides, an agent to edit slides %in active PowerPoint sessions.<n>Our system enables 34.02% faster processing, 34.76% better instruction fidelity, and 87.42% cheaper operation than baselines.
- Score: 28.792459459465515
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Editing presentation slides remains one of the most common and time-consuming tasks faced by millions of users daily, despite significant advances in automated slide generation. Existing approaches have successfully demonstrated slide editing via graphic user interface (GUI)-based agents, offering intuitive visual control. However, such methods often suffer from high computational cost and latency. In this paper, we propose Talk-to-Your-Slides, an LLM-powered agent designed to edit slides %in active PowerPoint sessions by leveraging structured information about slide objects rather than relying on image modality. The key insight of our work is designing the editing process with distinct high-level and low-level layers to facilitate interaction between user commands and slide objects. By providing direct access to application objects rather than screen pixels, our system enables 34.02% faster processing, 34.76% better instruction fidelity, and 87.42% cheaper operation than baselines. To evaluate slide editing capabilities, we introduce TSBench, a human-annotated dataset comprising 379 diverse editing instructions paired with corresponding slide variations in four categories. Our code, benchmark and demos are available at https://anonymous.4open.science/r/Talk-to-Your-Slides-0F4C.
Related papers
- AI-Generated Lecture Slides for Improving Slide Element Detection and Retrieval [25.517836483457803]
We propose a large language model (LLM)-guided synthetic lecture slide generation pipeline, SynLecSlideGen.<n>We also create an evaluation benchmark, namely RealSlide by manually annotating 1,050 real lecture slides.<n> Experimental results show that few-shot transfer learning with pretraining on synthetic slides significantly improves performance compared to training only on real data.
arXiv Detail & Related papers (2025-06-30T08:11:31Z) - SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design [33.47715901943206]
We introduce SlideCoder, a layout-aware, retrieval-augmented framework for generating editable slides from reference images.<n> Experiments show that SlideCoder outperforms state-of-the-art baselines by up to 40.5 points, demonstrating strong performance across layout fidelity, execution accuracy, and visual consistency.
arXiv Detail & Related papers (2025-06-09T17:39:48Z) - From Shots to Stories: LLM-Assisted Video Editing with Unified Language Representations [0.9217021281095907]
Large Language Models (LLMs) and Vision-Language Models (VLMs) have demonstrated remarkable reasoning and generalization capabilities in video understanding.<n>This paper presents the first systematic study of LLMs in the context of video editing.
arXiv Detail & Related papers (2025-05-18T05:25:11Z) - Generating Narrated Lecture Videos from Slides with Synchronized Highlights [55.2480439325792]
We introduce an end-to-end system designed to automate the process of turning static slides into video lectures.<n>This system synthesizes a video lecture featuring AI-generated narration precisely synchronized with dynamic visual highlights.<n>We demonstrate the system's effectiveness through a technical evaluation using a manually annotated slide dataset with 1000 samples.
arXiv Detail & Related papers (2025-05-05T18:51:53Z) - Textual-to-Visual Iterative Self-Verification for Slide Generation [46.99825956909532]
We decompose the task of generating missing presentation slides into two key components: content generation and layout generation.<n>Our approach significantly outperforms baseline methods in terms of alignment, logical flow, visual appeal, and readability.
arXiv Detail & Related papers (2025-02-21T12:21:09Z) - AutoPresent: Designing Structured Visuals from Scratch [99.766901203884]
We benchmark end-to-end image generation and program generation methods with a variety of models.<n>We create AutoPresent, an 8B Llama-based model trained on 7k pairs of instructions paired with code for slide generation.
arXiv Detail & Related papers (2025-01-01T18:09:32Z) - Awaking the Slides: A Tuning-free and Knowledge-regulated AI Tutoring System via Language Model Coordination [52.20542825755132]
We develop Slide2Lecture, a tuning-free and knowledge-regulated intelligent tutoring system.
It can effectively convert an input lecture slide into a structured teaching agenda consisting of a set of heterogeneous teaching actions.
For teachers and developers, Slide2Lecture enables customization to cater to personalized demands.
arXiv Detail & Related papers (2024-09-11T16:03:09Z) - Real-time 3D-aware Portrait Editing from a Single Image [111.27169315556444]
3DPE can edit a face image following given prompts, like reference images or text descriptions.
A lightweight module is distilled from a 3D portrait generator and a text-to-image model.
arXiv Detail & Related papers (2024-02-21T18:36:26Z) - Learning to Edit: Aligning LLMs with Knowledge Editing [101.96620267293731]
We propose a Learning to Edit (LTE) framework, focusing on teaching large language models to apply updated knowledge into input questions.
LTE features a two-phase process: (i) the Alignment Phase, which fine-tunes LLMs on a meticulously curated parallel dataset to make reliable, in-scope edits.
We demonstrate LTE's superiority in knowledge editing performance, robustness in both batch and sequential editing, minimal interference on general tasks, and rapid editing speeds.
arXiv Detail & Related papers (2024-02-19T07:45:17Z) - SWEA: Updating Factual Knowledge in Large Language Models via Subject Word Embedding Altering [17.20346072074533]
Recent model editing is a promising technique for efficiently updating a small amount of knowledge of large language models.<n>We propose a detachable and expandable Subject Word Embedding Altering (SWEA) framework, which finds the editing embeddings through token-level matching.<n>We demonstrate the overall state-of-the-art (SOTA) performance of SWEA$oplus$OS on the CounterFact and zsRE datasets.
arXiv Detail & Related papers (2024-01-31T13:08:45Z) - Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models [52.894213114914805]
We present a method to create interpretable concept sliders that enable precise control over attributes in image generations from diffusion models.
A slider is created using a small set of prompts or sample images.
Our method can help address persistent quality issues in Stable XL Diffusion including repair of object deformations and fixing distorted hands.
arXiv Detail & Related papers (2023-11-20T18:59:01Z) - PPTC Benchmark: Evaluating Large Language Models for PowerPoint Task
Completion [96.47420221442397]
We introduce the PowerPoint Task Completion benchmark to assess the ability of Large Language Models to finish multi-turn, multi-modal instructions.
We also propose the PPTX-Match Evaluation System that evaluates if LLMs finish the instruction based on the prediction file rather than the label API sequence.
The results show that GPT-4 outperforms other LLMs with 75.1% accuracy in single-turn dialogue testing but faces challenges in completing entire sessions, achieving just 6% session accuracy.
arXiv Detail & Related papers (2023-11-03T08:06:35Z) - Beyond the Chat: Executable and Verifiable Text-Editing with LLMs [87.84199761550634]
Conversational interfaces powered by Large Language Models (LLMs) have recently become a popular way to obtain feedback during document editing.
We present InkSync, an editing interface that suggests executable edits directly within the document being edited.
arXiv Detail & Related papers (2023-09-27T00:56:17Z) - Telling Stories from Computational Notebooks: AI-Assisted Presentation
Slides Creation for Presenting Data Science Work [47.558611855454195]
This paper presents NB2Slides, an AI system that facilitates users to compose presentations of their data science work.
NB2Slides uses deep learning methods as well as example-based prompts to generate slides from computational notebooks.
It also provides an interactive visualization that links the slides with the notebook to help users further edit the slides.
arXiv Detail & Related papers (2022-03-21T16:06:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.