Related papers: Frontend Diffusion: Exploring Intent-Based User Interfaces through Abstract-to-Detailed Task Transitions

Frontend Diffusion: Exploring Intent-Based User Interfaces through Abstract-to-Detailed Task Transitions

URL: http://arxiv.org/abs/2408.00778v1
Date: Tue, 16 Jul 2024 20:24:35 GMT
Title: Frontend Diffusion: Exploring Intent-Based User Interfaces through Abstract-to-Detailed Task Transitions
Authors: Qinshi Zhang, Latisha Besariani Hendra, Mohan Chi, Zijian Ding,
Abstract summary: We introduce Frontend Diffusion, an end-to-end tool that generates high-quality websites from user sketches. We demonstrate the potential of task transitions to reduce human intervention and communication costs in complex tasks.
Score: 1.845645938093348
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The emergence of Generative AI is catalyzing a paradigm shift in user interfaces from command-based to intent-based outcome specification. In this paper, we explore abstract-to-detailed task transitions in the context of frontend code generation as a step towards intent-based user interfaces, aiming to bridge the gap between abstract user intentions and concrete implementations. We introduce Frontend Diffusion, an end-to-end LLM-powered tool that generates high-quality websites from user sketches. The system employs a three-stage task transition process: sketching, writing, and coding. We demonstrate the potential of task transitions to reduce human intervention and communication costs in complex tasks. Our work also opens avenues for exploring similar approaches in other domains, potentially extending to more complex, interdependent tasks such as video production.

Related papers

Bridging UI Design and chatbot Interactions: Applying Form-Based Principles to Conversational Agents [0.2356141385409842]
This paper proposes modeling these GUI inspired metaphors acknowledgment (submit like) and context switching (reset-like) as explicit tasks within large language model (LLM) prompts.<n>We demonstrate our approach in hotel booking and customer management scenarios, highlighting improvements in multi-turn task coherence, user satisfaction, and efficiency.
arXiv Detail & Related papers (2025-07-02T16:24:50Z)
Hierarchical Context Transformer for Multi-level Semantic Scene Understanding [37.35498412336018]
We propose to represent the tasks set as multi-level semantic scene understanding (MSSU) For this target, we propose a novel hierarchical context transformer (HCT) network. Experiments on our cataract dataset and a publicly available endoscopic PSI-AVA dataset demonstrate the outstanding performance of our method.
arXiv Detail & Related papers (2025-02-21T03:36:16Z)
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction [69.57190742976091]
We introduce Aguvis, a unified vision-based framework for autonomous GUI agents. Our approach leverages image-based observations, and grounding instructions in natural language to visual elements. To address the limitations of previous work, we integrate explicit planning and reasoning within the model.
arXiv Detail & Related papers (2024-12-05T18:58:26Z)
Flex: End-to-End Text-Instructed Visual Navigation from Foundation Model Features [59.892436892964376]
We investigate the minimal data requirements and architectural adaptations necessary to achieve robust closed-loop performance with vision-based control policies.<n>Our findings are synthesized in Flex (Fly lexically), a framework that uses pre-trained Vision Language Models (VLMs) as frozen patch-wise feature extractors.<n>We demonstrate the effectiveness of this approach on a quadrotor fly-to-target task, where agents trained via behavior cloning successfully generalize to real-world scenes.
arXiv Detail & Related papers (2024-10-16T19:59:31Z)
AppAgent v2: Advanced Agent for Flexible Mobile Interactions [46.789563920416626]
This work introduces a novel LLM-based multimodal agent framework for mobile devices. Our agent constructs a flexible action space that enhances adaptability across various applications. Our results demonstrate the framework's superior performance, confirming its effectiveness in real-world scenarios.
arXiv Detail & Related papers (2024-08-05T06:31:39Z)
Think, Act, and Ask: Open-World Interactive Personalized Robot Navigation [17.279875204729553]
Zero-Shot Object Navigation (ZSON) enables agents to navigate towards open-vocabulary objects in unknown environments. We introduce ZIPON, where robots need to navigate to personalized goal objects while engaging in conversations with users. We propose Open-woRld Interactive persOnalized Navigation (ORION) to make sequential decisions to manipulate different modules for perception, navigation and communication.
arXiv Detail & Related papers (2023-10-12T01:17:56Z)
You Only Look at Screens: Multimodal Chain-of-Action Agents [37.118034745972956]
Auto-GUI is a multimodal solution that directly interacts with the interface. We propose a chain-of-action technique to help the agent decide what action to execute. We evaluate our approach on a new device-control benchmark AITW with 30$K$ unique instructions.
arXiv Detail & Related papers (2023-09-20T16:12:32Z)
InstructDiffusion: A Generalist Modeling Interface for Vision Tasks [52.981128371910266]
We present InstructDiffusion, a framework for aligning computer vision tasks with human instructions. InstructDiffusion could handle a variety of vision tasks, including understanding tasks and generative tasks. It even exhibits the ability to handle unseen tasks and outperforms prior methods on novel datasets.
arXiv Detail & Related papers (2023-09-07T17:56:57Z)
First Contact: Unsupervised Human-Machine Co-Adaptation via Mutual Information Maximization [112.40598205054994]
We formalize this idea as a completely unsupervised objective for optimizing interfaces. We conduct an observational study on 540K examples of users operating various keyboard and eye gaze interfaces for typing, controlling simulated robots, and playing video games. The results show that our mutual information scores are predictive of the ground-truth task completion metrics in a variety of domains.
arXiv Detail & Related papers (2022-05-24T21:57:18Z)
iFacetSum: Coreference-based Interactive Faceted Summarization for Multi-Document Exploration [63.272359227081836]
iFacetSum integrates interactive summarization together with faceted search. Fine-grained facets are automatically produced based on cross-document coreference pipelines.
arXiv Detail & Related papers (2021-09-23T20:01:11Z)
A Persistent Spatial Semantic Representation for High-level Natural Language Instruction Execution [54.385344986265714]
We propose a persistent spatial semantic representation method to bridge the gap between language and robot actions. We evaluate our approach on the ALFRED benchmark and achieve state-of-the-art results, despite completely avoiding the commonly used step-by-step instructions.
arXiv Detail & Related papers (2021-07-12T17:47:19Z)
Where2Act: From Pixels to Actions for Articulated 3D Objects [54.19638599501286]
We extract highly localized actionable information related to elementary actions such as pushing or pulling for articulated objects with movable parts. We propose a learning-from-interaction framework with an online data sampling strategy that allows us to train the network in simulation. Our learned models even transfer to real-world data.
arXiv Detail & Related papers (2021-01-07T18:56:38Z)
Modeling Long-horizon Tasks as Sequential Interaction Landscapes [75.5824586200507]
We present a deep learning network that learns dependencies and transitions across subtasks solely from a set of demonstration videos. We show that these symbols can be learned and predicted directly from image observations. We evaluate our framework on two long horizon tasks: (1) block stacking of puzzle pieces being executed by humans, and (2) a robot manipulation task involving pick and place of objects and sliding a cabinet door with a 7-DoF robot arm.
arXiv Detail & Related papers (2020-06-08T18:07:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.