Related papers: Composable Prompting Workspaces for Creative Writing: Exploration and Iteration Using Dynamic Widgets

Composable Prompting Workspaces for Creative Writing: Exploration and Iteration Using Dynamic Widgets

URL: http://arxiv.org/abs/2503.21394v1
Date: Thu, 27 Mar 2025 11:36:47 GMT
Title: Composable Prompting Workspaces for Creative Writing: Exploration and Iteration Using Dynamic Widgets
Authors: Rifat Mehreen Amin, Oliver Hans Kühle, Daniel Buschek, Andreas Butz,
Abstract summary: We propose a composable prompting canvas for text exploration using dynamic widgets.<n>Users generate widgets through system suggestions, prompting, or manually to capture task-relevant facets.<n>Our design significantly outperformed the baseline on the Creativity Support Index.
Score: 25.41215417987532
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generative AI models offer many possibilities for text creation and transformation. Current graphical user interfaces (GUIs) for prompting them lack support for iterative exploration, as they do not represent prompts as actionable interface objects. We propose the concept of a composable prompting canvas for text exploration and iteration using dynamic widgets. Users generate widgets through system suggestions, prompting, or manually to capture task-relevant facets that affect the generated text. In a comparative study with a baseline (conversational UI), 18 participants worked on two writing tasks, creating diverse prompting environments with custom widgets and spatial layouts. They reported having more control over the generated text and preferred our system over the baseline. Our design significantly outperformed the baseline on the Creativity Support Index, and participants felt the results were worth the effort. This work highlights the need for GUIs that support user-driven customization and (re-)structuring to increase both the flexibility and efficiency of prompting.

Related papers

PromptCanvas: Composable Prompting Workspaces Using Dynamic Widgets for Exploration and Iteration in Creative Writing [25.41215417987532]
We introduce PromptCanvas, a concept that transforms prompting into a composable, widget-based experience on an infinite canvas.<n>Users can generate, customize, and arrange interactive widgets representing various facets of their text, offering greater control over AI-generated content.
arXiv Detail & Related papers (2025-06-04T09:13:51Z)
ViMo: A Generative Visual GUI World Model for App Agent [60.27668506731929]
ViMo is a visual world model designed to generate future App observations as images. We propose a novel data representation, the Symbolic Text Representation, to overlay text content with symbolic placeholders. With this design, ViMo employs a STR Predictor to predict future GUIs' graphics and a GUI-text Predictor for generating the corresponding text.
arXiv Detail & Related papers (2025-04-15T14:03:10Z)
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction [69.57190742976091]
We introduce Aguvis, a unified vision-based framework for autonomous GUI agents. Our approach leverages image-based observations, and grounding instructions in natural language to visual elements. To address the limitations of previous work, we integrate explicit planning and reasoning within the model.
arXiv Detail & Related papers (2024-12-05T18:58:26Z)
ShowUI: One Vision-Language-Action Model for GUI Visual Agent [80.50062396585004]
Building Graphical User Interface (GUI) assistants holds significant promise for enhancing human workflow productivity. We develop a vision-language-action model in digital world, namely ShowUI, which features the following innovations. ShowUI, a lightweight 2B model using 256K data, achieves a strong 75.1% accuracy in zero-shot screenshot grounding.
arXiv Detail & Related papers (2024-11-26T14:29:47Z)
GLDesigner: Leveraging Multi-Modal LLMs as Designer for Enhanced Aesthetic Text Glyph Layouts [53.568057283934714]
We propose a Vision-Language Model (VLM)-based framework that generates content-aware text logo layouts.<n>We introduce two model techniques that reduce the computational cost for processing multiple glyph images simultaneously.<n>To support instruction tuning of our model, we construct two extensive text logo datasets that are five times larger than existing public datasets.
arXiv Detail & Related papers (2024-11-18T10:04:10Z)
VideoGUI: A Benchmark for GUI Automation from Instructional Videos [78.97292966276706]
VideoGUI is a novel multi-modal benchmark designed to evaluate GUI assistants on visual-centric GUI tasks. Sourced from high-quality web instructional videos, our benchmark focuses on tasks involving professional and novel software. Our evaluation reveals that even the SoTA large multimodal model GPT4o performs poorly on visual-centric GUI tasks.
arXiv Detail & Related papers (2024-06-14T17:59:08Z)
Empowering Visual Creativity: A Vision-Language Assistant to Image Editing Recommendations [109.65267337037842]
We introduce the task of Image Editing Recommendation (IER) IER aims to automatically generate diverse creative editing instructions from an input image and a simple prompt representing the users' under-specified editing purpose. We introduce Creativity-Vision Language Assistant(Creativity-VLA), a multimodal framework designed specifically for edit-instruction generation.
arXiv Detail & Related papers (2024-05-31T18:22:29Z)
Towards Full Authorship with AI: Supporting Revision with AI-Generated Views [3.109675063162349]
Large language models (LLMs) are shaping a new user interface (UI) paradigm in writing tools by enabling users to generate text through prompts. This paradigm shifts some creative control from the user to the system, thereby diminishing the user's authorship and autonomy in the writing process. We introduce Textfocals, a prototype designed to investigate a human-centered approach that emphasizes the user's role in writing.
arXiv Detail & Related papers (2024-03-02T01:11:35Z)
Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following [59.997857926808116]
We introduce a semantic panel as the decoding in texts to images. The panel is obtained through arranging the visual concepts parsed from the input text. We develop a practical system and showcase its potential in continuous generation and chatting-based editing.
arXiv Detail & Related papers (2023-11-28T17:57:44Z)
PromptCrafter: Crafting Text-to-Image Prompt through Mixed-Initiative Dialogue with LLM [2.2894985490441377]
We present PromptCrafter, a novel mixed-initiative system that allows step-by-step crafting of text-to-image prompt. Through the iterative process, users can efficiently explore the model's capability, and clarify their intent. PromptCrafter also supports users to refine prompts by answering various responses to clarifying questions generated by a Large Language Model.
arXiv Detail & Related papers (2023-07-18T05:51:00Z)
Promptify: Text-to-Image Generation through Interactive Prompt Exploration with Large Language Models [29.057923932305123]
We present Promptify, an interactive system that supports prompt exploration and refinement for text-to-image generative models. Our user study shows that Promptify effectively facilitates the text-to-image workflow and outperforms an existing baseline tool widely used for text-to-image generation.
arXiv Detail & Related papers (2023-04-18T22:59:11Z)
Contextual Dynamic Prompting for Response Generation in Task-oriented Dialog Systems [8.419582942080927]
Response generation is one of the critical components in task-oriented dialog systems. We propose an approach that performs textit dynamic prompting where the prompts are learnt from dialog contexts. We show that contextual dynamic prompts improve response generation in terms of textit combined score citemehri-etal 2019-structured by 3 absolute points.
arXiv Detail & Related papers (2023-01-30T20:26:02Z)
UI Layers Group Detector: Grouping UI Layers via Text Fusion and Box Attention [7.614630088064978]
We propose a vision-based method that automatically detects images (i.e., basic shapes and visual elements) and text layers that present the same semantic meanings. We construct a large-scale UI dataset for training and testing, and present a data augmentation approach to boost the detection performance.
arXiv Detail & Related papers (2022-12-07T03:50:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.