SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design
- URL: http://arxiv.org/abs/2506.07964v1
- Date: Mon, 09 Jun 2025 17:39:48 GMT
- Title: SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design
- Authors: Wenxin Tang, Jingyu Xiao, Wenxuan Jiang, Xi Xiao, Yuhang Wang, Xuxin Tang, Qing Li, Yuehe Ma, Junliang Liu, Shisong Tang, Michael R. Lyu,
- Abstract summary: We introduce SlideCoder, a layout-aware, retrieval-augmented framework for generating editable slides from reference images.<n> Experiments show that SlideCoder outperforms state-of-the-art baselines by up to 40.5 points, demonstrating strong performance across layout fidelity, execution accuracy, and visual consistency.
- Score: 33.47715901943206
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Manual slide creation is labor-intensive and requires expert prior knowledge. Existing natural language-based LLM generation methods struggle to capture the visual and structural nuances of slide designs. To address this, we formalize the Reference Image to Slide Generation task and propose Slide2Code, the first benchmark with difficulty-tiered samples based on a novel Slide Complexity Metric. We introduce SlideCoder, a layout-aware, retrieval-augmented framework for generating editable slides from reference images. SlideCoder integrates a Color Gradient-based Segmentation algorithm and a Hierarchical Retrieval-Augmented Generation method to decompose complex tasks and enhance code generation. We also release SlideMaster, a 7B open-source model fine-tuned with improved reverse-engineered data. Experiments show that SlideCoder outperforms state-of-the-art baselines by up to 40.5 points, demonstrating strong performance across layout fidelity, execution accuracy, and visual consistency. Our code is available at https://github.com/vinsontang1/SlideCoder.
Related papers
- AI-Generated Lecture Slides for Improving Slide Element Detection and Retrieval [25.517836483457803]
We propose a large language model (LLM)-guided synthetic lecture slide generation pipeline, SynLecSlideGen.<n>We also create an evaluation benchmark, namely RealSlide by manually annotating 1,050 real lecture slides.<n> Experimental results show that few-shot transfer learning with pretraining on synthetic slides significantly improves performance compared to training only on real data.
arXiv Detail & Related papers (2025-06-30T08:11:31Z) - PreGenie: An Agentic Framework for High-quality Visual Presentation Generation [25.673526096069548]
PreGenie is an agentic and modular framework powered by multimodal large language models (MLLMs) for generating high-quality visual presentations.<n>It operates in two stages: (1) Analysis and Initial Generation, which summarizes multimodal input and generates initial code, and (2) Review and Re-generation, which iteratively reviews intermediate code and rendered slides to produce final, high-quality presentations.
arXiv Detail & Related papers (2025-05-27T18:36:19Z) - Talk to Your Slides: Language-Driven Agents for Efficient Slide Editing [28.792459459465515]
We propose Talk-to-Your-Slides, an agent to edit slides %in active PowerPoint sessions.<n>Our system enables 34.02% faster processing, 34.76% better instruction fidelity, and 87.42% cheaper operation than baselines.
arXiv Detail & Related papers (2025-05-16T18:12:26Z) - Textual-to-Visual Iterative Self-Verification for Slide Generation [46.99825956909532]
We decompose the task of generating missing presentation slides into two key components: content generation and layout generation.<n>Our approach significantly outperforms baseline methods in terms of alignment, logical flow, visual appeal, and readability.
arXiv Detail & Related papers (2025-02-21T12:21:09Z) - AutoPresent: Designing Structured Visuals from Scratch [99.766901203884]
We benchmark end-to-end image generation and program generation methods with a variety of models.<n>We create AutoPresent, an 8B Llama-based model trained on 7k pairs of instructions paired with code for slide generation.
arXiv Detail & Related papers (2025-01-01T18:09:32Z) - WarriorCoder: Learning from Expert Battles to Augment Code Large Language Models [67.15146980023621]
We propose WarriorCoder, a novel paradigm learns from expert battles to address limitations of current approaches.<n>We create an arena where leading expert code LLMs challenge each other, with evaluations conducted by impartial judges.<n>This competitive framework generates novel training data from scratch, leveraging the strengths of all participants.
arXiv Detail & Related papers (2024-12-23T08:47:42Z) - LLMGA: Multimodal Large Language Model based Generation Assistant [53.150283805515926]
We introduce a Multimodal Large Language Model-based Generation Assistant (LLMGA) to assist users in image generation and editing.
We train the MLLM to grasp the properties of image generation and editing, enabling it to generate detailed prompts.
Extensive results show that LLMGA has promising generation and editing capabilities and can enable more flexible and expansive applications.
arXiv Detail & Related papers (2023-11-27T13:37:26Z) - DOC2PPT: Automatic Presentation Slides Generation from Scientific
Documents [76.19748112897177]
We present a novel task and approach for document-to-slide generation.
We propose a hierarchical sequence-to-sequence approach to tackle our task in an end-to-end manner.
Our approach exploits the inherent structures within documents and slides and incorporates paraphrasing and layout prediction modules to generate slides.
arXiv Detail & Related papers (2021-01-28T03:21:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.