AutoGameUI: Constructing High-Fidelity Game UIs via Multimodal Learning and Interactive Web-Based Tool
- URL: http://arxiv.org/abs/2411.03709v1
- Date: Wed, 06 Nov 2024 07:16:54 GMT
- Title: AutoGameUI: Constructing High-Fidelity Game UIs via Multimodal Learning and Interactive Web-Based Tool
- Authors: Zhongliang Tang, Mengchen Tan, Fei Xia, Qingrong Cheng, Hao Jiang, Yongxiang Zhang,
- Abstract summary: We introduce an innovative system, AutoGameUI, for efficiently constructing cohesive user interfaces in game development.
We propose a two-stage multimodal learning pipeline to obtain comprehensive representations of both UI and UX designs.
Through the correspondences, a cohesive user interface is automatically constructed from pairwise designs.
- Score: 21.639682821138663
- License:
- Abstract: We introduce an innovative system, AutoGameUI, for efficiently constructing cohesive user interfaces in game development. Our system is the first to address the coherence issue arising from integrating inconsistent UI and UX designs, typically leading to mismatches and inefficiencies. We propose a two-stage multimodal learning pipeline to obtain comprehensive representations of both UI and UX designs, and to establish their correspondences. Through the correspondences, a cohesive user interface is automatically constructed from pairwise designs. To achieve high-fidelity effects, we introduce a universal data protocol for precise design descriptions and cross-platform applications. We also develop an interactive web-based tool for game developers to facilitate the use of our system. We create a game UI dataset from actual game projects and combine it with a public dataset for training and evaluation. Our experimental results demonstrate the effectiveness of our system in maintaining coherence between the constructed interfaces and the original designs.
Related papers
- UI-TARS: Pioneering Automated GUI Interaction with Native Agents [58.18100825673032]
This paper introduces UI-TARS, a native GUI agent model that solely perceives the screenshots as input and performs human-like interactions.
In the OSWorld benchmark, UI-TARS achieves scores of 24.6 with 50 steps and 22.7 with 15 steps, outperforming Claude (22.0 and 14.9 respectively)
arXiv Detail & Related papers (2025-01-21T17:48:10Z) - Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction [69.57190742976091]
We introduce Aguvis, a unified vision-based framework for autonomous GUI agents.
Our approach leverages image-based observations, and grounding instructions in natural language to visual elements.
To address the limitations of previous work, we integrate explicit planning and reasoning within the model.
arXiv Detail & Related papers (2024-12-05T18:58:26Z) - On AI-Inspired UI-Design [5.969881132928718]
We discuss three complementary Artificial Intelligence (AI) approaches for triggering the creativity of app designers.
First, designers can prompt a Large Language Model (LLM) to directly generate and adjust UIs.
Second, a Vision-Language Model (VLM) enables designers to effectively search a large screenshot dataset.
Third, a Diffusion Model (DM) can be trained to specifically generate UIs as inspirational images.
arXiv Detail & Related papers (2024-06-19T15:28:21Z) - Tell Me What's Next: Textual Foresight for Generic UI Representations [65.10591722192609]
We propose Textual Foresight, a novel pretraining objective for learning UI screen representations.
Textual Foresight generates global text descriptions of future UI states given a current UI and local action taken.
We train with our newly constructed mobile app dataset, OpenApp, which results in the first public dataset for app UI representation learning.
arXiv Detail & Related papers (2024-06-12T02:43:19Z) - PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM [58.67882997399021]
Our research introduces a unified framework for automated graphic layout generation.
Our data-driven method employs structured text (JSON format) and visual instruction tuning to generate layouts.
We develop an automated text-to-poster system that generates editable posters based on users' design intentions.
arXiv Detail & Related papers (2024-06-05T03:05:52Z) - UIClip: A Data-driven Model for Assessing User Interface Design [20.66914084220734]
We develop a machine-learned model, UIClip, for assessing the design quality and visual relevance of a user interface.
We show how UIClip can facilitate downstream applications that rely on instantaneous assessment of UI design quality.
arXiv Detail & Related papers (2024-04-18T20:43:08Z) - X2T: Training an X-to-Text Typing Interface with Online Learning from
User Feedback [83.95599156217945]
We focus on assistive typing applications in which a user cannot operate a keyboard, but can supply other inputs.
Standard methods train a model on a fixed dataset of user inputs, then deploy a static interface that does not learn from its mistakes.
We investigate a simple idea that would enable such interfaces to improve over time, with minimal additional effort from the user.
arXiv Detail & Related papers (2022-03-04T00:07:20Z) - UIBert: Learning Generic Multimodal Representations for UI Understanding [12.931540149350633]
We introduce a transformer-based joint image-text model trained through novel pre-training tasks on large-scale unlabeled UI data.
Our key intuition is that the heterogeneous features in a UI are self-aligned, i.e., the image and text features of UI components, are predictive of each other.
We propose five pretraining tasks utilizing this self-alignment among different features of a UI component and across various components in the same UI.
We evaluate our method on nine real-world downstream UI tasks where UIBert outperforms strong multimodal baselines by up to 9.26% accuracy.
arXiv Detail & Related papers (2021-07-29T03:51:36Z) - Magic Layouts: Structural Prior for Component Detection in User
Interface Designs [28.394160581239174]
We present Magic Layouts; a method for parsing screenshots or hand-drawn sketches of user interface (UI) layouts.
Our core contribution is to extend existing detectors to exploit a learned structural prior for UI designs.
We demonstrate within the context an interactive application for rapidly acquiring digital prototypes of user experience (UX) designs.
arXiv Detail & Related papers (2021-06-14T17:20:36Z) - VINS: Visual Search for Mobile User Interface Design [66.28088601689069]
This paper introduces VINS, a visual search framework, that takes as input a UI image and retrieves visually similar design examples.
The framework achieves a mean Average Precision of 76.39% for the UI detection and high performance in querying similar UI designs.
arXiv Detail & Related papers (2021-02-10T01:46:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.