Related papers: DeepPresenter: Environment-Grounded Reflection for Agentic Presentation Generation

DeepPresenter: Environment-Grounded Reflection for Agentic Presentation Generation

URL: http://arxiv.org/abs/2602.22839v1
Date: Thu, 26 Feb 2026 10:26:48 GMT
Title: DeepPresenter: Environment-Grounded Reflection for Agentic Presentation Generation
Authors: Hao Zheng, Guozhao Mo, Xinru Yan, Qianhao Yuan, Wenkai Zhang, Xuanang Chen, Yaojie Lu, Hongyu Lin, Xianpei Han, Le Sun,
Abstract summary: DeepPresenter is an agentic framework that adapts to diverse user intents.<n>DeepPresenter autonomously plans, renders, and revises intermediate slide artifacts.
Score: 75.7505732466149
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Presentation generation requires deep content research, coherent visual design, and iterative refinement based on observation. However, existing presentation agents often rely on predefined workflows and fixed templates. To address this, we present DeepPresenter, an agentic framework that adapts to diverse user intents, enables effective feedback-driven refinement, and generalizes beyond a scripted pipeline. Specifically, DeepPresenter autonomously plans, renders, and revises intermediate slide artifacts to support long-horizon refinement with environmental observations. Furthermore, rather than relying on self-reflection over internal signals (e.g., reasoning traces), our environment-grounded reflection conditions the generation process on perceptual artifact states (e.g., rendered slides), enabling the system to identify and correct presentation-specific issues during execution. Results on the evaluation set covering diverse presentation-generation scenarios show that DeepPresenter achieves state-of-the-art performance, and the fine-tuned 9B model remains highly competitive at substantially lower cost. Our project is available at: https://github.com/icip-cas/PPTAgent

Related papers

EPRBench: A High-Quality Benchmark Dataset for Event Stream Based Visual Place Recognition [54.55914886780534]
Event stream-based Visual Place Recognition (VPR) is an emerging research direction that offers a compelling solution to the instability of conventional visible-light cameras under challenging conditions such as low illumination, overexposure, and high-speed motion.<n>We introduce EPRBench, a high-quality benchmark specifically designed for event stream-based VPR.<n>EPRBench comprises 10K event sequences and 65K event frames, collected using both handheld and vehicle-mounted setups to comprehensively capture real-world challenges across diverse viewpoints, weather conditions, and lighting scenarios.
arXiv Detail & Related papers (2026-02-13T13:25:05Z)
GEBench: Benchmarking Image Generation Models as GUI Environments [49.513441724802135]
We introduce GEBench, a benchmark for evaluating dynamic interaction and temporal coherence in GUI generation.<n>GE-Score is a novel five-dimensional metric that assesses Goal Achievement, Interaction Logic, Content Consistency, UI Plausibility, and Visual Quality.<n>Our findings identify icon interpretation, text rendering, and localization precision as critical bottlenecks.
arXiv Detail & Related papers (2026-02-09T18:52:02Z)
From Prediction to Explanation: Multimodal, Explainable, and Interactive Deepfake Detection Framework for Non-Expert Users [21.627851460651968]
We present DF-P2E (Deepfake: Prediction to Explanation), a novel framework that integrates visual, semantic, and narrative layers of explanation to make deepfake detection interpretable and accessible.<n>We instantiate and evaluate the framework on the DF40 benchmark, the most diverse deepfake dataset to date.<n> Experiments demonstrate that our system achieves competitive detection performance while providing high-quality explanations aligned with Grad-CAM activations.
arXiv Detail & Related papers (2025-08-11T03:55:47Z)
PPTAgent: Generating and Evaluating Presentations Beyond Text-to-Slides [51.88536367177796]
We propose a two-stage, edit-based approach inspired by human drafts for automatically generating presentations.<n>PWTAgent first analyzes references to extract slide-level functional types and content schemas, then generates editing actions based on selected reference slides.<n>PWTAgent significantly outperforms existing automatic presentation generation methods across all three dimensions.
arXiv Detail & Related papers (2025-01-07T16:53:01Z)
DiffPrompter: Differentiable Implicit Visual Prompts for Semantic-Segmentation in Adverse Conditions [14.52296033767276]
We introduce DiffPrompter, a novel differentiable visual and latent prompting mechanism. Our proposed $nabla$HFC image processing block excels particularly in adverse weather conditions.
arXiv Detail & Related papers (2023-10-06T11:53:04Z)
A Threefold Review on Deep Semantic Segmentation: Efficiency-oriented, Temporal and Depth-aware design [77.34726150561087]
We conduct a survey on the most relevant and recent advances in Deep Semantic in the context of vision for autonomous vehicles. Our main objective is to provide a comprehensive discussion on the main methods, advantages, limitations, results and challenges faced from each perspective.
arXiv Detail & Related papers (2023-03-08T01:29:55Z)
Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis [20.316056261749946]
We propose an end-to-end vision and language model incorporating explicit knowledge graphs. We also introduce an interactive out-of-distribution layer using implicit network operator. In practice, we apply our model on several vision and language downstream tasks including visual question answering, visual reasoning, and image-text retrieval.
arXiv Detail & Related papers (2023-02-11T05:46:21Z)
Generative Counterfactuals for Neural Networks via Attribute-Informed Perturbation [51.29486247405601]
We design a framework to generate counterfactuals for raw data instances with the proposed Attribute-Informed Perturbation (AIP) By utilizing generative models conditioned with different attributes, counterfactuals with desired labels can be obtained effectively and efficiently. Experimental results on real-world texts and images demonstrate the effectiveness, sample quality as well as efficiency of our designed framework.
arXiv Detail & Related papers (2021-01-18T08:37:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.