ThematicPlane: Bridging Tacit User Intent and Latent Spaces for Image Generation
- URL: http://arxiv.org/abs/2508.06065v1
- Date: Fri, 08 Aug 2025 06:57:14 GMT
- Title: ThematicPlane: Bridging Tacit User Intent and Latent Spaces for Image Generation
- Authors: Daniel Lee, Nikhil Sharma, Donghoon Shin, DaEun Choi, Harsh Sharma, Jeonghwan Kim, Heng Ji,
- Abstract summary: We introduce ThematicPlane, a system that enables users to navigate and manipulate high-level semantic concepts.<n>This interface bridges the gap between tacit creative intent and system control.
- Score: 49.805992099208595
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Generative AI has made image creation more accessible, yet aligning outputs with nuanced creative intent remains challenging, particularly for non-experts. Existing tools often require users to externalize ideas through prompts or references, limiting fluid exploration. We introduce ThematicPlane, a system that enables users to navigate and manipulate high-level semantic concepts (e.g., mood, style, or narrative tone) within an interactive thematic design plane. This interface bridges the gap between tacit creative intent and system control. In our exploratory study (N=6), participants engaged in divergent and convergent creative modes, often embracing unexpected results as inspiration or iteration cues. While they grounded their exploration in familiar themes, differing expectations of how themes mapped to outputs revealed a need for more explainable controls. Overall, ThematicPlane fosters expressive, iterative workflows and highlights new directions for intuitive, semantics-driven interaction in generative design tools.
Related papers
- Design Generative AI for Practitioners: Exploring Interaction Approaches Aligned with Creative Practice [20.726284013271584]
We present three interaction approaches that distribute control across intent, input, and process.<n>We argue that alignment is a dynamic negotiation, with AI adopting proactive or reactive roles according to designers' instrumental and inspirational needs and the creative stage.
arXiv Detail & Related papers (2026-03-03T15:19:23Z) - Inspiration Seeds: Learning Non-Literal Visual Combinations for Generative Exploration [13.00602873238112]
We propose Inspiration Seeds, a generative framework that shifts image generation from final execution to exploratory ideation.<n>We use CLIP Sparse Autoencoders to extract editing directions in CLIP latent space and isolate concept pairs.
arXiv Detail & Related papers (2026-02-09T13:00:16Z) - Beyond Pixels: Visual Metaphor Transfer via Schema-Driven Agentic Reasoning [56.24016465596292]
A visual metaphor constitutes a high-order form of human creativity, employing cross-domain semantic fusion to transform abstract concepts into impactful visual rhetoric.<n>We introduce the task of Visual Metaphor Transfer (VMT), which challenges models to autonomously decouple the "creative essence" from a reference image and re-materialize that abstract logic onto a user-specified subject.<n>Our method significantly outperforms SOTA baselines in metaphor consistency, analogy appropriateness, and visual creativity, paving the way for automated high-impact creative applications in advertising and media.
arXiv Detail & Related papers (2026-02-01T17:01:36Z) - Latent Implicit Visual Reasoning [59.39913238320798]
We propose a task-agnostic mechanism that trains LMMs to discover and use visual reasoning tokens without explicit supervision.<n>Our approach outperforms direct fine-tuning and achieves state-of-the-art results on a diverse range of vision-centric tasks.
arXiv Detail & Related papers (2025-12-24T14:59:49Z) - VLM-Guided Adaptive Negative Prompting for Creative Generation [21.534474554320823]
Creative generation is the synthesis of new, surprising, and valuable samples that reflect user intent yet cannot be envisioned in advance.<n>We propose VLM-Guided Adaptive Negative-Prompting, a training-free, inference-time method that promotes creative image generation.<n>We show consistent gains in creative novelty with negligible computational overhead.
arXiv Detail & Related papers (2025-10-12T17:34:59Z) - Ming-UniVision: Joint Image Understanding and Generation with a Unified Continuous Tokenizer [50.69959748410398]
We introduce MingTok, a new family of visual tokenizers with a continuous latent space for unified autoregressive generation and understanding.<n>MingTok adopts a three-stage sequential architecture involving low-level encoding, semantic expansion, and visual reconstruction.<n>Built on top of it, Ming-UniVision eliminates the need for task-specific visual representations, and unifies diverse vision-language tasks under a single autoregrsssive prediction paradigm.
arXiv Detail & Related papers (2025-10-08T02:50:14Z) - Dynamic Scoring with Enhanced Semantics for Training-Free Human-Object Interaction Detection [51.52749744031413]
Human-Object Interaction (HOI) detection aims to identify humans and objects within images and interpret their interactions.<n>Existing HOI methods rely heavily on large datasets with manual annotations to learn interactions from visual cues.<n>We propose a novel training-free HOI detection framework for Dynamic Scoring with enhanced semantics.
arXiv Detail & Related papers (2025-07-23T12:30:19Z) - Expanding the Generative AI Design Space through Structured Prompting and Multimodal Interfaces [1.051328497890725]
ACAI (AI Co-Creation for Advertising and Inspiration) is a multimodal generative AI tool designed to support novice designers by moving beyond traditional prompt interfaces.<n>This work contributes to HCI research on generative systems by showing how structured interfaces can foreground user-defined context, improve alignment, and enhance co-creative control in novice creative.
arXiv Detail & Related papers (2025-04-19T14:57:32Z) - POET: Supporting Prompting Creativity and Personalization with Automated Expansion of Text-to-Image Generation [31.886910258606875]
State-of-the-art visual generative AI tools hold immense potential to assist users in the early ideation stages of creative tasks.<n>Many large-scale text-to-image systems are designed for broad applicability, yielding conventional output that may limit creative exploration.<n>We introduce POET, a real-time interactive tool that automatically discovers dimensions of homogeneity in text-to-image generative models.
arXiv Detail & Related papers (2025-04-18T00:54:36Z) - Survey of User Interface Design and Interaction Techniques in Generative AI Applications [79.55963742878684]
We aim to create a compendium of different user-interaction patterns that can be used as a reference for designers and developers alike.
We also strive to lower the entry barrier for those attempting to learn more about the design of generative AI applications.
arXiv Detail & Related papers (2024-10-28T23:10:06Z) - Flex: End-to-End Text-Instructed Visual Navigation from Foundation Model Features [59.892436892964376]
We investigate the minimal data requirements and architectural adaptations necessary to achieve robust closed-loop performance with vision-based control policies.<n>Our findings are synthesized in Flex (Fly lexically), a framework that uses pre-trained Vision Language Models (VLMs) as frozen patch-wise feature extractors.<n>We demonstrate the effectiveness of this approach on a quadrotor fly-to-target task, where agents trained via behavior cloning successfully generalize to real-world scenes.
arXiv Detail & Related papers (2024-10-16T19:59:31Z) - The role of interface design on prompt-mediated creativity in Generative
AI [0.0]
We analyze more than 145,000 prompts from two Generative AI platforms.
We find that users exhibit a tendency towards exploration of new topics over exploitation of concepts visited previously.
arXiv Detail & Related papers (2023-11-30T22:33:34Z) - Enhancing HOI Detection with Contextual Cues from Large Vision-Language Models [56.257840490146]
ConCue is a novel approach for improving visual feature extraction in HOI detection.
We develop a transformer-based feature extraction module with a multi-tower architecture that integrates contextual cues into both instance and interaction detectors.
arXiv Detail & Related papers (2023-11-26T09:11:32Z) - InstructDiffusion: A Generalist Modeling Interface for Vision Tasks [52.981128371910266]
We present InstructDiffusion, a framework for aligning computer vision tasks with human instructions.
InstructDiffusion could handle a variety of vision tasks, including understanding tasks and generative tasks.
It even exhibits the ability to handle unseen tasks and outperforms prior methods on novel datasets.
arXiv Detail & Related papers (2023-09-07T17:56:57Z) - Knowledge-enriched Attention Network with Group-wise Semantic for Visual
Storytelling [39.59158974352266]
Visual storytelling aims at generating an imaginary and coherent story with narrative multi-sentences from a group of relevant images.
Existing methods often generate direct and rigid descriptions of apparent image-based contents, because they are not capable of exploring implicit information beyond images.
To address these problems, a novel knowledge-enriched attention network with group-wise semantic model is proposed.
arXiv Detail & Related papers (2022-03-10T12:55:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.