A Pairwise Dataset for GUI Conversion and Retrieval between Android
Phones and Tablets
- URL: http://arxiv.org/abs/2307.13225v3
- Date: Sun, 5 Nov 2023 04:37:28 GMT
- Title: A Pairwise Dataset for GUI Conversion and Retrieval between Android
Phones and Tablets
- Authors: Han Hu, Haolan Zhan, Yujin Huang, Di Liu
- Abstract summary: Papt dataset is a pairwise dataset for GUI conversion and retrieval between Android phones and tablets.
dataset contains 10,035 phone-tablet GUI page pairs from 5,593 phone-tablet app pairs.
- Score: 24.208087862974033
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the popularity of smartphones and tablets, users have become accustomed
to using different devices for different tasks, such as using their phones to
play games and tablets to watch movies. To conquer the market, one app is often
available on both smartphones and tablets. However, although one app has
similar graphic user interfaces (GUIs) and functionalities on phone and tablet,
current app developers typically start from scratch when developing a
tablet-compatible version of their app, which drives up development costs and
wastes existing design resources. Researchers are attempting to employ deep
learning in automated GUIs development to enhance developers' productivity.
Deep learning models rely heavily on high-quality datasets. There are currently
several publicly accessible GUI page datasets for phones, but none for pairwise
GUIs between phones and tablets. This poses a significant barrier to the
employment of deep learning in automated GUI development. In this paper, we
collect and make public the Papt dataset, which is a pairwise dataset for GUI
conversion and retrieval between Android phones and tablets. The dataset
contains 10,035 phone-tablet GUI page pairs from 5,593 phone-tablet app pairs.
We illustrate the approaches of collecting pairwise data and statistical
analysis of this dataset. We also illustrate the advantages of our dataset
compared to other current datasets. Through preliminary experiments on this
dataset, we analyse the present challenges of utilising deep learning in
automated GUI development and find that our dataset can assist the application
of some deep learning models to tasks involving automatic GUI development.
Related papers
- GUICourse: From General Vision Language Models to Versatile GUI Agents [75.5150601913659]
We contribute GUICourse, a suite of datasets to train visual-based GUI agents.
First, we introduce the GUIEnv dataset to strengthen the OCR and grounding capabilities of VLMs.
Then, we introduce the GUIAct and GUIChat datasets to enrich their knowledge of GUI components and interactions.
arXiv Detail & Related papers (2024-06-17T08:30:55Z) - GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents [73.9254861755974]
This paper introduces a new dataset, called GUI-World, which features meticulously crafted Human-MLLM annotations.
We evaluate the capabilities of current state-of-the-art MLLMs, including ImageLLMs and VideoLLMs, in understanding various types of GUI content.
arXiv Detail & Related papers (2024-06-16T06:56:53Z) - GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices [61.48043339441149]
GUI Odyssey consists of 7,735 episodes from 6 mobile devices, spanning 6 types of cross-app tasks, 201 apps, and 1.4K app combos.
We developed OdysseyAgent, a multimodal cross-app navigation agent by fine-tuning the Qwen-VL model with a history resampling module.
arXiv Detail & Related papers (2024-06-12T17:44:26Z) - SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents [17.43878828389188]
We propose a novel visual Graphical User Interface (GUI) agent, SeeClick, which only relies on screenshots for task automation.
To tackle this challenge, we propose to enhance SeeClick with GUI grounding pre-training and devise a method to automate the curation of GUI grounding data.
We have also created ScreenSpot, the first realistic GUI grounding benchmark that encompasses mobile, desktop, and web environments.
arXiv Detail & Related papers (2024-01-17T08:10:35Z) - Pairwise GUI Dataset Construction Between Android Phones and Tablets [24.208087862974033]
Papt dataset is a pairwise GUI dataset tailored for Android phones and tablets.
We propose novel pairwise GUI collection approaches for constructing this dataset.
arXiv Detail & Related papers (2023-10-07T09:30:42Z) - BridgeData V2: A Dataset for Robot Learning at Scale [73.86688388408021]
BridgeData V2 is a large and diverse dataset of robotic manipulation behaviors.
It contains 60,096 trajectories collected across 24 environments on a publicly available low-cost robot.
arXiv Detail & Related papers (2023-08-24T17:41:20Z) - Automated Mapping of Adaptive App GUIs from Phones to TVs [31.207923538204795]
Existing techniques to map a mobile app GUI to a TV either adopt a responsive design or use mirror apps for improved video display.
We propose a semi-automated approach to generate corresponding adaptive TV GUIs, given the phone GUIs as the input.
Our tool is not only beneficial to developers but also to GUI designers, who can further customize the generated GUIs for their TV app development.
arXiv Detail & Related papers (2023-07-24T04:35:51Z) - NiCro: Purely Vision-based, Non-intrusive Cross-Device and
Cross-Platform GUI Testing [19.462053492572142]
We propose a non-intrusive cross-device and cross-platform system NiCro.
NiCro uses the state-of-the-art GUI widget detector to detect widgets from GUI images and then analyses a set of comprehensive information to match the widgets across diverse devices.
At the system level, NiCro can interact with a virtual device farm and a robotic arm system to perform cross-device, cross-platform testing non-intrusively.
arXiv Detail & Related papers (2023-05-24T01:19:05Z) - META-GUI: Towards Multi-modal Conversational Agents on Mobile GUI [28.484013258445067]
We propose a new TOD architecture: GUI-based task-oriented dialogue system (GUI-TOD)
A GUI-TOD system can directly perform GUI operations on real APPs and execute tasks without invoking backend APIs.
We release META-GUI, a dataset for training a Multi-modal conversational agent on mobile GUI.
arXiv Detail & Related papers (2022-05-23T04:05:37Z) - REGRAD: A Large-Scale Relational Grasp Dataset for Safe and
Object-Specific Robotic Grasping in Clutter [52.117388513480435]
We present a new dataset named regrad to sustain the modeling of relationships among objects and grasps.
Our dataset is collected in both forms of 2D images and 3D point clouds.
Users are free to import their own object models for the generation of as many data as they want.
arXiv Detail & Related papers (2021-04-29T05:31:21Z) - TapNet: The Design, Training, Implementation, and Applications of a
Multi-Task Learning CNN for Off-Screen Mobile Input [75.05709030478073]
We present the design, training, implementation and applications of TapNet, a multi-task network that detects tapping on the smartphone.
TapNet can jointly learn from data across devices and simultaneously recognize multiple tap properties, including tap direction and tap location.
arXiv Detail & Related papers (2021-02-18T00:45:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.