Related papers: Talk to Your Slides: Language-Driven Agents for Efficient Slide Editing

Talk to Your Slides: Language-Driven Agents for Efficient Slide Editing

URL: http://arxiv.org/abs/2505.11604v3
Date: Sun, 25 May 2025 15:05:53 GMT
Title: Talk to Your Slides: Language-Driven Agents for Efficient Slide Editing
Authors: Kyudan Jung, Hojun Cho, Jooyeol Yun, Soyoung Yang, Jaehyeok Jang, Jaegul Choo,
Abstract summary: We propose Talk-to-Your-Slides, an agent to edit slides %in active PowerPoint sessions.<n>Our system enables 34.02% faster processing, 34.76% better instruction fidelity, and 87.42% cheaper operation than baselines.
Score: 28.792459459465515
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Editing presentation slides remains one of the most common and time-consuming tasks faced by millions of users daily, despite significant advances in automated slide generation. Existing approaches have successfully demonstrated slide editing via graphic user interface (GUI)-based agents, offering intuitive visual control. However, such methods often suffer from high computational cost and latency. In this paper, we propose Talk-to-Your-Slides, an LLM-powered agent designed to edit slides %in active PowerPoint sessions by leveraging structured information about slide objects rather than relying on image modality. The key insight of our work is designing the editing process with distinct high-level and low-level layers to facilitate interaction between user commands and slide objects. By providing direct access to application objects rather than screen pixels, our system enables 34.02% faster processing, 34.76% better instruction fidelity, and 87.42% cheaper operation than baselines. To evaluate slide editing capabilities, we introduce TSBench, a human-annotated dataset comprising 379 diverse editing instructions paired with corresponding slide variations in four categories. Our code, benchmark and demos are available at https://anonymous.4open.science/r/Talk-to-Your-Slides-0F4C.

Related papers

AI-Generated Lecture Slides for Improving Slide Element Detection and Retrieval [25.517836483457803]
We propose a large language model (LLM)-guided synthetic lecture slide generation pipeline, SynLecSlideGen.<n>We also create an evaluation benchmark, namely RealSlide by manually annotating 1,050 real lecture slides.<n> Experimental results show that few-shot transfer learning with pretraining on synthetic slides significantly improves performance compared to training only on real data.
arXiv Detail & Related papers (2025-06-30T08:11:31Z)
SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design [33.47715901943206]
We introduce SlideCoder, a layout-aware, retrieval-augmented framework for generating editable slides from reference images.<n> Experiments show that SlideCoder outperforms state-of-the-art baselines by up to 40.5 points, demonstrating strong performance across layout fidelity, execution accuracy, and visual consistency.
arXiv Detail & Related papers (2025-06-09T17:39:48Z)
From Shots to Stories: LLM-Assisted Video Editing with Unified Language Representations [0.9217021281095907]
Large Language Models (LLMs) and Vision-Language Models (VLMs) have demonstrated remarkable reasoning and generalization capabilities in video understanding.<n>This paper presents the first systematic study of LLMs in the context of video editing.
arXiv Detail & Related papers (2025-05-18T05:25:11Z)
Generating Narrated Lecture Videos from Slides with Synchronized Highlights [55.2480439325792]
We introduce an end-to-end system designed to automate the process of turning static slides into video lectures.<n>This system synthesizes a video lecture featuring AI-generated narration precisely synchronized with dynamic visual highlights.<n>We demonstrate the system's effectiveness through a technical evaluation using a manually annotated slide dataset with 1000 samples.
arXiv Detail & Related papers (2025-05-05T18:51:53Z)
Textual-to-Visual Iterative Self-Verification for Slide Generation [46.99825956909532]
We decompose the task of generating missing presentation slides into two key components: content generation and layout generation.<n>Our approach significantly outperforms baseline methods in terms of alignment, logical flow, visual appeal, and readability.
arXiv Detail & Related papers (2025-02-21T12:21:09Z)
AutoPresent: Designing Structured Visuals from Scratch [99.766901203884]
We benchmark end-to-end image generation and program generation methods with a variety of models.<n>We create AutoPresent, an 8B Llama-based model trained on 7k pairs of instructions paired with code for slide generation.
arXiv Detail & Related papers (2025-01-01T18:09:32Z)
Awaking the Slides: A Tuning-free and Knowledge-regulated AI Tutoring System via Language Model Coordination [52.20542825755132]
We develop Slide2Lecture, a tuning-free and knowledge-regulated intelligent tutoring system. It can effectively convert an input lecture slide into a structured teaching agenda consisting of a set of heterogeneous teaching actions. For teachers and developers, Slide2Lecture enables customization to cater to personalized demands.
arXiv Detail & Related papers (2024-09-11T16:03:09Z)
Real-time 3D-aware Portrait Editing from a Single Image [111.27169315556444]
3DPE can edit a face image following given prompts, like reference images or text descriptions. A lightweight module is distilled from a 3D portrait generator and a text-to-image model.
arXiv Detail & Related papers (2024-02-21T18:36:26Z)
Learning to Edit: Aligning LLMs with Knowledge Editing [101.96620267293731]
We propose a Learning to Edit (LTE) framework, focusing on teaching large language models to apply updated knowledge into input questions. LTE features a two-phase process: (i) the Alignment Phase, which fine-tunes LLMs on a meticulously curated parallel dataset to make reliable, in-scope edits. We demonstrate LTE's superiority in knowledge editing performance, robustness in both batch and sequential editing, minimal interference on general tasks, and rapid editing speeds.
arXiv Detail & Related papers (2024-02-19T07:45:17Z)
SWEA: Updating Factual Knowledge in Large Language Models via Subject Word Embedding Altering [17.20346072074533]
Recent model editing is a promising technique for efficiently updating a small amount of knowledge of large language models.<n>We propose a detachable and expandable Subject Word Embedding Altering (SWEA) framework, which finds the editing embeddings through token-level matching.<n>We demonstrate the overall state-of-the-art (SOTA) performance of SWEA$oplus$OS on the CounterFact and zsRE datasets.
arXiv Detail & Related papers (2024-01-31T13:08:45Z)
Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models [52.894213114914805]
We present a method to create interpretable concept sliders that enable precise control over attributes in image generations from diffusion models. A slider is created using a small set of prompts or sample images. Our method can help address persistent quality issues in Stable XL Diffusion including repair of object deformations and fixing distorted hands.
arXiv Detail & Related papers (2023-11-20T18:59:01Z)
PPTC Benchmark: Evaluating Large Language Models for PowerPoint Task Completion [96.47420221442397]
We introduce the PowerPoint Task Completion benchmark to assess the ability of Large Language Models to finish multi-turn, multi-modal instructions. We also propose the PPTX-Match Evaluation System that evaluates if LLMs finish the instruction based on the prediction file rather than the label API sequence. The results show that GPT-4 outperforms other LLMs with 75.1% accuracy in single-turn dialogue testing but faces challenges in completing entire sessions, achieving just 6% session accuracy.
arXiv Detail & Related papers (2023-11-03T08:06:35Z)
Beyond the Chat: Executable and Verifiable Text-Editing with LLMs [87.84199761550634]
Conversational interfaces powered by Large Language Models (LLMs) have recently become a popular way to obtain feedback during document editing. We present InkSync, an editing interface that suggests executable edits directly within the document being edited.
arXiv Detail & Related papers (2023-09-27T00:56:17Z)
Telling Stories from Computational Notebooks: AI-Assisted Presentation Slides Creation for Presenting Data Science Work [47.558611855454195]
This paper presents NB2Slides, an AI system that facilitates users to compose presentations of their data science work. NB2Slides uses deep learning methods as well as example-based prompts to generate slides from computational notebooks. It also provides an interactive visualization that links the slides with the notebook to help users further edit the slides.
arXiv Detail & Related papers (2022-03-21T16:06:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.