Pairwise GUI Dataset Construction Between Android Phones and Tablets
- URL: http://arxiv.org/abs/2310.04755v3
- Date: Sun, 5 Nov 2023 04:39:19 GMT
- Title: Pairwise GUI Dataset Construction Between Android Phones and Tablets
- Authors: Han Hu, Haolan Zhan, Yujin Huang, Di Liu
- Abstract summary: Papt dataset is a pairwise GUI dataset tailored for Android phones and tablets.
We propose novel pairwise GUI collection approaches for constructing this dataset.
- Score: 24.208087862974033
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the current landscape of pervasive smartphones and tablets, apps
frequently exist across both platforms. Although apps share most graphic user
interfaces (GUIs) and functionalities across phones and tablets, developers
often rebuild from scratch for tablet versions, escalating costs and
squandering existing design resources. Researchers are attempting to collect
data and employ deep learning in automated GUIs development to enhance
developers' productivity. There are currently several publicly accessible GUI
page datasets for phones, but none for pairwise GUIs between phones and
tablets. This poses a significant barrier to the employment of deep learning in
automated GUI development. In this paper, we introduce the Papt dataset, a
pioneering pairwise GUI dataset tailored for Android phones and tablets,
encompassing 10,035 phone-tablet GUI page pairs sourced from 5,593 unique app
pairs. We propose novel pairwise GUI collection approaches for constructing
this dataset and delineate its advantages over currently prevailing datasets in
the field. Through preliminary experiments on this dataset, we analyze the
present challenges of utilizing deep learning in automated GUI development.
Related papers
- OS-ATLAS: A Foundation Action Model for Generalist GUI Agents [55.37173845836839]
OS-Atlas is a foundational GUI action model that excels at GUI grounding and OOD agentic tasks.
We are releasing the largest open-source cross-platform GUI grounding corpus to date, which contains over 13 million GUI elements.
arXiv Detail & Related papers (2024-10-30T17:10:19Z) - AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents [50.39555842254652]
We introduce the Android Multi-annotation EXpo (AMEX) to advance research on AI agents in mobile scenarios.
AMEX comprises over 104K high-resolution screenshots from 110 popular mobile applications, which are annotated at multiple levels.
AMEX includes three levels of annotations: GUI interactive element grounding, GUI screen and element functionality descriptions, and complex natural language instructions.
arXiv Detail & Related papers (2024-07-03T17:59:58Z) - GUICourse: From General Vision Language Models to Versatile GUI Agents [75.5150601913659]
We contribute GUICourse, a suite of datasets to train visual-based GUI agents.
First, we introduce the GUIEnv dataset to strengthen the OCR and grounding capabilities of VLMs.
Then, we introduce the GUIAct and GUIChat datasets to enrich their knowledge of GUI components and interactions.
arXiv Detail & Related papers (2024-06-17T08:30:55Z) - GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents [73.9254861755974]
This paper introduces a new dataset, called GUI-World, which features meticulously crafted Human-MLLM annotations.
We evaluate the capabilities of current state-of-the-art MLLMs, including ImageLLMs and VideoLLMs, in understanding various types of GUI content.
arXiv Detail & Related papers (2024-06-16T06:56:53Z) - GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices [61.48043339441149]
GUI Odyssey consists of 7,735 episodes from 6 mobile devices, spanning 6 types of cross-app tasks, 201 apps, and 1.4K app combos.
We developed OdysseyAgent, a multimodal cross-app navigation agent by fine-tuning the Qwen-VL model with a history resampling module.
arXiv Detail & Related papers (2024-06-12T17:44:26Z) - Tell Me What's Next: Textual Foresight for Generic UI Representations [65.10591722192609]
We propose Textual Foresight, a novel pretraining objective for learning UI screen representations.
Textual Foresight generates global text descriptions of future UI states given a current UI and local action taken.
We train with our newly constructed mobile app dataset, OpenApp, which results in the first public dataset for app UI representation learning.
arXiv Detail & Related papers (2024-06-12T02:43:19Z) - SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents [17.43878828389188]
We propose a novel visual Graphical User Interface (GUI) agent, SeeClick, which only relies on screenshots for task automation.
To tackle this challenge, we propose to enhance SeeClick with GUI grounding pre-training and devise a method to automate the curation of GUI grounding data.
We have also created ScreenSpot, the first realistic GUI grounding benchmark that encompasses mobile, desktop, and web environments.
arXiv Detail & Related papers (2024-01-17T08:10:35Z) - A Pairwise Dataset for GUI Conversion and Retrieval between Android
Phones and Tablets [24.208087862974033]
Papt dataset is a pairwise dataset for GUI conversion and retrieval between Android phones and tablets.
dataset contains 10,035 phone-tablet GUI page pairs from 5,593 phone-tablet app pairs.
arXiv Detail & Related papers (2023-07-25T03:25:56Z) - Automated Mapping of Adaptive App GUIs from Phones to TVs [31.207923538204795]
Existing techniques to map a mobile app GUI to a TV either adopt a responsive design or use mirror apps for improved video display.
We propose a semi-automated approach to generate corresponding adaptive TV GUIs, given the phone GUIs as the input.
Our tool is not only beneficial to developers but also to GUI designers, who can further customize the generated GUIs for their TV app development.
arXiv Detail & Related papers (2023-07-24T04:35:51Z) - META-GUI: Towards Multi-modal Conversational Agents on Mobile GUI [28.484013258445067]
We propose a new TOD architecture: GUI-based task-oriented dialogue system (GUI-TOD)
A GUI-TOD system can directly perform GUI operations on real APPs and execute tasks without invoking backend APIs.
We release META-GUI, a dataset for training a Multi-modal conversational agent on mobile GUI.
arXiv Detail & Related papers (2022-05-23T04:05:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.