Frontend Diffusion: Exploring Intent-Based User Interfaces through Abstract-to-Detailed Task Transitions
- URL: http://arxiv.org/abs/2408.00778v1
- Date: Tue, 16 Jul 2024 20:24:35 GMT
- Title: Frontend Diffusion: Exploring Intent-Based User Interfaces through Abstract-to-Detailed Task Transitions
- Authors: Qinshi Zhang, Latisha Besariani Hendra, Mohan Chi, Zijian Ding,
- Abstract summary: We introduce Frontend Diffusion, an end-to-end tool that generates high-quality websites from user sketches.
We demonstrate the potential of task transitions to reduce human intervention and communication costs in complex tasks.
- Score: 1.845645938093348
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The emergence of Generative AI is catalyzing a paradigm shift in user interfaces from command-based to intent-based outcome specification. In this paper, we explore abstract-to-detailed task transitions in the context of frontend code generation as a step towards intent-based user interfaces, aiming to bridge the gap between abstract user intentions and concrete implementations. We introduce Frontend Diffusion, an end-to-end LLM-powered tool that generates high-quality websites from user sketches. The system employs a three-stage task transition process: sketching, writing, and coding. We demonstrate the potential of task transitions to reduce human intervention and communication costs in complex tasks. Our work also opens avenues for exploring similar approaches in other domains, potentially extending to more complex, interdependent tasks such as video production.
Related papers
- AppAgent v2: Advanced Agent for Flexible Mobile Interactions [46.789563920416626]
This work introduces a novel LLM-based multimodal agent framework for mobile devices.
Our agent constructs a flexible action space that enhances adaptability across various applications.
Our results demonstrate the framework's superior performance, confirming its effectiveness in real-world scenarios.
arXiv Detail & Related papers (2024-08-05T06:31:39Z) - Think, Act, and Ask: Open-World Interactive Personalized Robot Navigation [17.279875204729553]
Zero-Shot Object Navigation (ZSON) enables agents to navigate towards open-vocabulary objects in unknown environments.
We introduce ZIPON, where robots need to navigate to personalized goal objects while engaging in conversations with users.
We propose Open-woRld Interactive persOnalized Navigation (ORION) to make sequential decisions to manipulate different modules for perception, navigation and communication.
arXiv Detail & Related papers (2023-10-12T01:17:56Z) - You Only Look at Screens: Multimodal Chain-of-Action Agents [37.118034745972956]
Auto-GUI is a multimodal solution that directly interacts with the interface.
We propose a chain-of-action technique to help the agent decide what action to execute.
We evaluate our approach on a new device-control benchmark AITW with 30$K$ unique instructions.
arXiv Detail & Related papers (2023-09-20T16:12:32Z) - InstructDiffusion: A Generalist Modeling Interface for Vision Tasks [52.981128371910266]
We present InstructDiffusion, a framework for aligning computer vision tasks with human instructions.
InstructDiffusion could handle a variety of vision tasks, including understanding tasks and generative tasks.
It even exhibits the ability to handle unseen tasks and outperforms prior methods on novel datasets.
arXiv Detail & Related papers (2023-09-07T17:56:57Z) - First Contact: Unsupervised Human-Machine Co-Adaptation via Mutual
Information Maximization [112.40598205054994]
We formalize this idea as a completely unsupervised objective for optimizing interfaces.
We conduct an observational study on 540K examples of users operating various keyboard and eye gaze interfaces for typing, controlling simulated robots, and playing video games.
The results show that our mutual information scores are predictive of the ground-truth task completion metrics in a variety of domains.
arXiv Detail & Related papers (2022-05-24T21:57:18Z) - iFacetSum: Coreference-based Interactive Faceted Summarization for
Multi-Document Exploration [63.272359227081836]
iFacetSum integrates interactive summarization together with faceted search.
Fine-grained facets are automatically produced based on cross-document coreference pipelines.
arXiv Detail & Related papers (2021-09-23T20:01:11Z) - A Persistent Spatial Semantic Representation for High-level Natural
Language Instruction Execution [54.385344986265714]
We propose a persistent spatial semantic representation method to bridge the gap between language and robot actions.
We evaluate our approach on the ALFRED benchmark and achieve state-of-the-art results, despite completely avoiding the commonly used step-by-step instructions.
arXiv Detail & Related papers (2021-07-12T17:47:19Z) - Where2Act: From Pixels to Actions for Articulated 3D Objects [54.19638599501286]
We extract highly localized actionable information related to elementary actions such as pushing or pulling for articulated objects with movable parts.
We propose a learning-from-interaction framework with an online data sampling strategy that allows us to train the network in simulation.
Our learned models even transfer to real-world data.
arXiv Detail & Related papers (2021-01-07T18:56:38Z) - Modeling Long-horizon Tasks as Sequential Interaction Landscapes [75.5824586200507]
We present a deep learning network that learns dependencies and transitions across subtasks solely from a set of demonstration videos.
We show that these symbols can be learned and predicted directly from image observations.
We evaluate our framework on two long horizon tasks: (1) block stacking of puzzle pieces being executed by humans, and (2) a robot manipulation task involving pick and place of objects and sliding a cabinet door with a 7-DoF robot arm.
arXiv Detail & Related papers (2020-06-08T18:07:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.