Related papers: Automated Mapping of Adaptive App GUIs from Phones to TVs

Automated Mapping of Adaptive App GUIs from Phones to TVs

URL: http://arxiv.org/abs/2307.12522v2
Date: Sun, 5 Nov 2023 04:31:34 GMT
Title: Automated Mapping of Adaptive App GUIs from Phones to TVs
Authors: Han Hu, Ruiqi Dong, John Grundy, Thai Minh Nguyen, Huaxiao Liu, Chunyang Chen
Abstract summary: Existing techniques to map a mobile app GUI to a TV either adopt a responsive design or use mirror apps for improved video display. We propose a semi-automated approach to generate corresponding adaptive TV GUIs, given the phone GUIs as the input. Our tool is not only beneficial to developers but also to GUI designers, who can further customize the generated GUIs for their TV app development.
Score: 31.207923538204795
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the increasing interconnection of smart devices, users often desire to adopt the same app on quite different devices for identical tasks, such as watching the same movies on both their smartphones and TVs. However, the significant differences in screen size, aspect ratio, and interaction styles make it challenging to adapt Graphical User Interfaces (GUIs) across these devices. Although there are millions of apps available on Google Play, only a few thousand are designed to support smart TV displays. Existing techniques to map a mobile app GUI to a TV either adopt a responsive design, which struggles to bridge the substantial gap between phone and TV or use mirror apps for improved video display, which requires hardware support and extra engineering efforts. Instead of developing another app for supporting TVs, we propose a semi-automated approach to generate corresponding adaptive TV GUIs, given the phone GUIs as the input. Based on our empirical study of GUI pairs for TVs and phones in existing apps, we synthesize a list of rules for grouping and classifying phone GUIs, converting them to TV GUIs, and generating dynamic TV layouts and source code for the TV display. Our tool is not only beneficial to developers but also to GUI designers, who can further customize the generated GUIs for their TV app development. An evaluation and user study demonstrate the accuracy of our generated GUIs and the usefulness of our tool.

Related papers

MobileGUI-RL: Advancing Mobile GUI Agent through Reinforcement Learning in Online Environment [63.62778707277929]
MobileGUI-RL is a scalable framework that trains GUI agent in online environment.<n>It synthesizes a curriculum of learnable tasks through self-exploration and filtering.<n>It adapts GRPO to GUI navigation with trajectory-aware advantages and composite rewards.
arXiv Detail & Related papers (2025-07-08T07:07:53Z)
TongUI: Building Generalized GUI Agents by Learning from Multimodal Web Tutorials [70.06743063375121]
We propose the TongUI framework that builds generalized GUI agents by learning from rich multimodal web tutorials. We produce the GUI-Net dataset containing 143K trajectory data across five operating systems and more than 200 applications. We develop the TongUI agent by fine-tuning Qwen2.5-VL-3B/7B models on GUI-Net, which show remarkable performance improvements on commonly used grounding and navigation benchmarks.
arXiv Detail & Related papers (2025-04-17T06:15:56Z)
GUIWatcher: Automatically Detecting GUI Lags by Analyzing Mobile Application Screencasts [9.997570370503617]
The Graphical User Interface (GUI) plays a central role in mobile applications, directly affecting usability and user satisfaction. Poor GUI performance, such as lag or unresponsiveness, can lead to negative user experience and decreased mobile application (app) ratings. We present GUIWatcher, a framework designed to detect GUI lags by analyzing screencasts recorded during mobile app testing.
arXiv Detail & Related papers (2025-02-06T16:43:51Z)
Falcon-UI: Understanding GUI Before Following User Instructions [57.67308498231232]
We introduce an instruction-free GUI navigation dataset, termed Insight-UI dataset, to enhance model comprehension of GUI environments. Insight-UI dataset is automatically generated from the Common Crawl corpus, simulating various platforms. We develop the GUI agent model Falcon-UI, which is initially pretrained on Insight-UI dataset and subsequently fine-tuned on Android and Web GUI datasets.
arXiv Detail & Related papers (2024-12-12T15:29:36Z)
MobileFlow: A Multimodal LLM For Mobile GUI Agent [4.7619361168442005]
This paper introduces MobileFlow, a multimodal large language model meticulously crafted for mobile GUI agents. MobileFlow contains approximately 21 billion parameters and is equipped with novel hybrid visual encoders. It has the capacity to fully interpret image data and comprehend user instructions for GUI interaction tasks.
arXiv Detail & Related papers (2024-07-05T08:37:10Z)
AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents [50.39555842254652]
We introduce the Android Multi-annotation EXpo (AMEX) to advance research on AI agents in mobile scenarios. AMEX comprises over 104K high-resolution screenshots from 110 popular mobile applications, which are annotated at multiple levels. AMEX includes three levels of annotations: GUI interactive element grounding, GUI screen and element functionality descriptions, and complex natural language instructions.
arXiv Detail & Related papers (2024-07-03T17:59:58Z)
GUICourse: From General Vision Language Models to Versatile GUI Agents [75.5150601913659]
We contribute GUICourse, a suite of datasets to train visual-based GUI agents. First, we introduce the GUIEnv dataset to strengthen the OCR and grounding capabilities of VLMs. Then, we introduce the GUIAct and GUIChat datasets to enrich their knowledge of GUI components and interactions.
arXiv Detail & Related papers (2024-06-17T08:30:55Z)
GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents [73.9254861755974]
This paper introduces a new dataset, called GUI-World, which features meticulously crafted Human-MLLM annotations. We evaluate the capabilities of current state-of-the-art MLLMs, including ImageLLMs and VideoLLMs, in understanding various types of GUI content.
arXiv Detail & Related papers (2024-06-16T06:56:53Z)
GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices [61.48043339441149]
GUI Odyssey consists of 7,735 episodes from 6 mobile devices, spanning 6 types of cross-app tasks, 201 apps, and 1.4K app combos. We developed OdysseyAgent, a multimodal cross-app navigation agent by fine-tuning the Qwen-VL model with a history resampling module.
arXiv Detail & Related papers (2024-06-12T17:44:26Z)
GUing: A Mobile GUI Search Engine using a Vision-Language Model [6.024602799136753]
This paper proposes GUing, a GUI search engine based on a vision-language model called GUIClip. We first collected from Google Play app introduction images which display the most representative screenshots. Then, we developed an automated pipeline to classify, crop, and extract the captions from these images. We used this dataset to train a novel vision-language model, which is, to the best of our knowledge, the first of its kind for GUI retrieval.
arXiv Detail & Related papers (2024-04-30T18:42:18Z)
GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation [167.6232690168905]
MM-Navigator is a GPT-4V-based agent for the smartphone graphical user interface (GUI) navigation task. MM-Navigator can interact with a smartphone screen as human users, and determine subsequent actions to fulfill given instructions.
arXiv Detail & Related papers (2023-11-13T18:53:37Z)
Pairwise GUI Dataset Construction Between Android Phones and Tablets [24.208087862974033]
Papt dataset is a pairwise GUI dataset tailored for Android phones and tablets. We propose novel pairwise GUI collection approaches for constructing this dataset.
arXiv Detail & Related papers (2023-10-07T09:30:42Z)
A Pairwise Dataset for GUI Conversion and Retrieval between Android Phones and Tablets [24.208087862974033]
Papt dataset is a pairwise dataset for GUI conversion and retrieval between Android phones and tablets. dataset contains 10,035 phone-tablet GUI page pairs from 5,593 phone-tablet app pairs.
arXiv Detail & Related papers (2023-07-25T03:25:56Z)
NiCro: Purely Vision-based, Non-intrusive Cross-Device and Cross-Platform GUI Testing [19.462053492572142]
We propose a non-intrusive cross-device and cross-platform system NiCro. NiCro uses the state-of-the-art GUI widget detector to detect widgets from GUI images and then analyses a set of comprehensive information to match the widgets across diverse devices. At the system level, NiCro can interact with a virtual device farm and a robotic arm system to perform cross-device, cross-platform testing non-intrusively.
arXiv Detail & Related papers (2023-05-24T01:19:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.