Toward the Automated Localization of Buggy Mobile App UIs from Bug   Descriptions
        - URL: http://arxiv.org/abs/2408.04075v1
- Date: Wed, 7 Aug 2024 20:26:20 GMT
- Title: Toward the Automated Localization of Buggy Mobile App UIs from Bug   Descriptions
- Authors: Antu Saha, Yang Song, Junayed Mahmud, Ying Zhou, Kevin Moran, Oscar Chaparro, 
- Abstract summary: The identification of buggy UI screens and UI components is important to localizing the buggy behavior and fixing it.
This paper is the first to investigate the feasibility of automating the task of Buggy UI localization.
We find that incorporating localized buggy UIs leads to improvements of 9%-12% in Hits@10.
- Score: 19.304569170230316
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract:   Bug report management is a costly software maintenance process comprised of several challenging tasks. Given the UI-driven nature of mobile apps, bugs typically manifest through the UI, hence the identification of buggy UI screens and UI components (Buggy UI Localization) is important to localizing the buggy behavior and eventually fixing it. However, this task is challenging as developers must reason about bug descriptions (which are often low-quality), and the visual or code-based representations of UI screens.   This paper is the first to investigate the feasibility of automating the task of Buggy UI Localization through a comprehensive study that evaluates the capabilities of one textual and two multi-modal deep learning (DL) techniques and one textual unsupervised technique. We evaluate such techniques at two levels of granularity, Buggy UI Screen and UI Component localization. Our results illustrate the individual strengths of models that make use of different representations, wherein models that incorporate visual information perform better for UI screen localization, and models that operate on textual screen information perform better for UI component localization -- highlighting the need for a localization approach that blends the benefits of both types of techniques. Furthermore, we study whether Buggy UI Localization can improve traditional buggy code localization, and find that incorporating localized buggy UIs leads to improvements of 9%-12% in Hits@10. 
 
      
        Related papers
        - DiMo-GUI: Advancing Test-time Scaling in GUI Grounding via   Modality-Aware Visual Reasoning [52.37530640460363]
 We introduce DiMo-GUI, a training-free framework for GUI grounding.<n>Instead of treating the GUI as a monolithic image, our method splits the input into textual elements and iconic elements.<n>When predictions are ambiguous or incorrect, DiMo-GUI dynamically focuses attention by generating candidate focal regions.
 arXiv  Detail & Related papers  (2025-06-12T03:13:21Z)
- GUI-explorer: Autonomous Exploration and Mining of Transition-aware   Knowledge for GUI Agent [66.34801160469067]
 MLLMs suffer from two key issues: misinterpreting UI components and outdated knowledge.<n>We propose GUI-explorer, a training-free GUI agent that incorporates two fundamental mechanisms.<n>With a task success rate of 53.7% on SPA-Bench and 47.4% on AndroidWorld, GUI-explorer shows significant improvements over SOTA agents.
 arXiv  Detail & Related papers  (2025-05-22T16:01:06Z)
- Visual Test-time Scaling for GUI Agent Grounding [61.609126885427386]
 We introduce RegionFocus, a visual test-time scaling approach for Vision Language Model Agents.
Our approach dynamically zooms in on relevant regions, reducing background clutter and improving grounding accuracy.
We observe significant performance gains of 28+% on Screenspot-pro and 24+% on WebVoyager benchmarks.
 arXiv  Detail & Related papers  (2025-05-01T17:45:59Z)
- UI-TARS: Pioneering Automated GUI Interaction with Native Agents [58.18100825673032]
 This paper introduces UI-TARS, a native GUI agent model that solely perceives the screenshots as input and performs human-like interactions.
In the OSWorld benchmark, UI-TARS achieves scores of 24.6 with 50 steps and 22.7 with 15 steps, outperforming Claude (22.0 and 14.9 respectively)
 arXiv  Detail & Related papers  (2025-01-21T17:48:10Z)
- Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction [69.57190742976091]
 We introduce Aguvis, a unified vision-based framework for autonomous GUI agents.
Our approach leverages image-based observations, and grounding instructions in natural language to visual elements.
To address the limitations of previous work, we integrate explicit planning and reasoning within the model.
 arXiv  Detail & Related papers  (2024-12-05T18:58:26Z)
- ShowUI: One Vision-Language-Action Model for GUI Visual Agent [80.50062396585004]
 Building Graphical User Interface (GUI) assistants holds significant promise for enhancing human workflow productivity.
We develop a vision-language-action model in digital world, namely ShowUI, which features the following innovations.
ShowUI, a lightweight 2B model using 256K data, achieves a strong 75.1% accuracy in zero-shot screenshot grounding.
 arXiv  Detail & Related papers  (2024-11-26T14:29:47Z)
- A Rule-Based Approach for UI Migration from Android to iOS [11.229343760409044]
 We propose a novel approach called GUIMIGRATOR, which enables the cross platform migration of existing Android app UIs to iOS.
GuiMIGRATOR extracts and parses Android UI layouts, views, and resources to construct a UI skeleton tree.
GuiMIGRATOR generates the final UI code files utilizing target code templates, which are then compiled and validated in the iOS development platform.
 arXiv  Detail & Related papers  (2024-09-25T06:19:54Z)
- Tell Me What's Next: Textual Foresight for Generic UI Representations [65.10591722192609]
 We propose Textual Foresight, a novel pretraining objective for learning UI screen representations.
Textual Foresight generates global text descriptions of future UI states given a current UI and local action taken.
We train with our newly constructed mobile app dataset, OpenApp, which results in the first public dataset for app UI representation learning.
 arXiv  Detail & Related papers  (2024-06-12T02:43:19Z)
- Multi-Granularity Language-Guided Multi-Object Tracking [95.91263758294154]
 We propose a new multi-object tracking framework, named LG-MOT, that explicitly leverages language information at different levels of granularity.
At inference, our LG-MOT uses the standard visual features without relying on annotated language descriptions.
Our LG-MOT achieves an absolute gain of 2.2% in terms of target object association (IDF1 score) compared to the baseline using only visual features.
 arXiv  Detail & Related papers  (2024-06-07T11:18:40Z)
- UI Layers Group Detector: Grouping UI Layers via Text Fusion and Box
  Attention [7.614630088064978]
 We propose a vision-based method that automatically detects images (i.e., basic shapes and visual elements) and text layers that present the same semantic meanings.
We construct a large-scale UI dataset for training and testing, and present a data augmentation approach to boost the detection performance.
 arXiv  Detail & Related papers  (2022-12-07T03:50:20Z)
- BigIssue: A Realistic Bug Localization Benchmark [89.8240118116093]
 BigIssue is a benchmark for realistic bug localization.
We provide a general benchmark with a diversity of real and synthetic Java bugs.
We hope to advance the state of the art in bug localization, in turn improving APR performance and increasing its applicability to the modern development cycle.
 arXiv  Detail & Related papers  (2022-07-21T20:17:53Z)
- Integrating Visuospatial, Linguistic and Commonsense Structure into
  Story Visualization [81.26077816854449]
 We first explore the use of constituency parse trees for encoding structured input.
Second, we augment the structured input with commonsense information and study the impact of this external knowledge on the generation of visual story.
Third, we incorporate visual structure via bounding boxes and dense captioning to provide feedback about the characters/objects in generated images.
 arXiv  Detail & Related papers  (2021-10-21T00:16:02Z)
- UIBert: Learning Generic Multimodal Representations for UI Understanding [12.931540149350633]
 We introduce a transformer-based joint image-text model trained through novel pre-training tasks on large-scale unlabeled UI data.
Our key intuition is that the heterogeneous features in a UI are self-aligned, i.e., the image and text features of UI components, are predictive of each other.
We propose five pretraining tasks utilizing this self-alignment among different features of a UI component and across various components in the same UI.
We evaluate our method on nine real-world downstream UI tasks where UIBert outperforms strong multimodal baselines by up to 9.26% accuracy.
 arXiv  Detail & Related papers  (2021-07-29T03:51:36Z)
- VINS: Visual Search for Mobile User Interface Design [66.28088601689069]
 This paper introduces VINS, a visual search framework, that takes as input a UI image and retrieves visually similar design examples.
The framework achieves a mean Average Precision of 76.39% for the UI detection and high performance in querying similar UI designs.
 arXiv  Detail & Related papers  (2021-02-10T01:46:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.