Testing the Limits: Unusual Text Inputs Generation for Mobile App Crash
Detection with Large Language Model
- URL: http://arxiv.org/abs/2310.15657v1
- Date: Tue, 24 Oct 2023 09:10:51 GMT
- Title: Testing the Limits: Unusual Text Inputs Generation for Mobile App Crash
Detection with Large Language Model
- Authors: Zhe Liu, Chunyang Chen, Junjie Wang, Mengzhuo Chen, Boyu Wu, Xing Che,
Dandan Wang, Qing Wang
- Abstract summary: This paper proposes InputBlaster to automatically generate unusual text inputs for mobile app crash detection.
It formulates the unusual inputs generation problem as a task of producing a set of test generators, each of which can yield a batch of unusual text inputs.
It is evaluated on 36 text input widgets with cash bugs involving 31 popular Android apps, and results show that it achieves 78% bug detection rate, with 136% higher than the best baseline.
- Score: 23.460051600514806
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Mobile applications have become a ubiquitous part of our daily life,
providing users with access to various services and utilities. Text input, as
an important interaction channel between users and applications, plays an
important role in core functionality such as search queries, authentication,
messaging, etc. However, certain special text (e.g., -18 for Font Size) can
cause the app to crash, and generating diversified unusual inputs for fully
testing the app is highly demanded. Nevertheless, this is also challenging due
to the combination of explosion dilemma, high context sensitivity, and complex
constraint relations. This paper proposes InputBlaster which leverages the LLM
to automatically generate unusual text inputs for mobile app crash detection.
It formulates the unusual inputs generation problem as a task of producing a
set of test generators, each of which can yield a batch of unusual text inputs
under the same mutation rule. In detail, InputBlaster leverages LLM to produce
the test generators together with the mutation rules serving as the reasoning
chain, and utilizes the in-context learning schema to demonstrate the LLM with
examples for boosting the performance. InputBlaster is evaluated on 36 text
input widgets with cash bugs involving 31 popular Android apps, and results
show that it achieves 78% bug detection rate, with 136% higher than the best
baseline. Besides, we integrate it with the automated GUI testing tool and
detect 37 unseen crashes in real-world apps from Google Play.
Related papers
- Detecting Machine-Generated Long-Form Content with Latent-Space Variables [54.07946647012579]
Existing zero-shot detectors primarily focus on token-level distributions, which are vulnerable to real-world domain shifts.
We propose a more robust method that incorporates abstract elements, such as event transitions, as key deciding factors to detect machine versus human texts.
arXiv Detail & Related papers (2024-10-04T18:42:09Z) - Large Language Models for Mobile GUI Text Input Generation: An Empirical Study [24.256184336154544]
Large Language Models (LLMs) have shown excellent text-generation capabilities.
This paper extensively investigates the effectiveness of nine state-of-the-art LLMs in Android text-input generation for UI pages.
arXiv Detail & Related papers (2024-04-13T09:56:50Z) - Make LLM a Testing Expert: Bringing Human-like Interaction to Mobile GUI
Testing via Functionality-aware Decisions [23.460051600514806]
GPTDroid is a Q&A-based GUI testing framework for mobile apps.
We introduce a functionality-aware memory prompting mechanism.
It outperforms the best baseline by 32% in activity coverage, and detects 31% more bugs at a faster rate.
arXiv Detail & Related papers (2023-10-24T12:30:26Z) - Towards Automated Accessibility Report Generation for Mobile Apps [14.908672785900832]
We propose a system to generate whole app accessibility reports.
It combines varied data collection methods (e.g., app crawling, manual recording) with an existing accessibility scanner.
arXiv Detail & Related papers (2023-09-29T19:05:11Z) - FacTool: Factuality Detection in Generative AI -- A Tool Augmented
Framework for Multi-Task and Multi-Domain Scenarios [87.12753459582116]
A wider range of tasks now face an increasing risk of containing factual errors when handled by generative models.
We propose FacTool, a task and domain agnostic framework for detecting factual errors of texts generated by large language models.
arXiv Detail & Related papers (2023-07-25T14:20:51Z) - M4: Multi-generator, Multi-domain, and Multi-lingual Black-Box
Machine-Generated Text Detection [69.29017069438228]
Large language models (LLMs) have demonstrated remarkable capability to generate fluent responses to a wide variety of user queries.
This has also raised concerns about the potential misuse of such texts in journalism, education, and academia.
In this study, we strive to create automated systems that can detect machine-generated texts and pinpoint potential misuse.
arXiv Detail & Related papers (2023-05-24T08:55:11Z) - DPIC: Decoupling Prompt and Intrinsic Characteristics for LLM Generated Text Detection [56.513637720967566]
Large language models (LLMs) can generate texts that pose risks of misuse, such as plagiarism, planting fake reviews on e-commerce platforms, or creating inflammatory false tweets.
Existing high-quality detection methods usually require access to the interior of the model to extract the intrinsic characteristics.
We propose to extract deep intrinsic characteristics of the black-box model generated texts.
arXiv Detail & Related papers (2023-05-21T17:26:16Z) - Chatting with GPT-3 for Zero-Shot Human-Like Mobile Automated GUI
Testing [23.460051600514806]
We propose GPTDroid, asking Large Language Model to chat with the mobile apps by passing the GUI page information to LLM to elicit testing scripts.
Within it, we extract the static context of the GUI page and the dynamic context of the iterative testing process.
We evaluate GPTDroid on 86 apps from Google Play, and its activity coverage is 71%, with 32% higher than the best baseline, and can detect 36% more bugs with faster speed than the best baseline.
arXiv Detail & Related papers (2023-05-16T13:46:52Z) - On the Possibilities of AI-Generated Text Detection [76.55825911221434]
We argue that as machine-generated text approximates human-like quality, the sample size needed for detection bounds increases.
We test various state-of-the-art text generators, including GPT-2, GPT-3.5-Turbo, Llama, Llama-2-13B-Chat-HF, and Llama-2-70B-Chat-HF, against detectors, including oBERTa-Large/Base-Detector, GPTZero.
arXiv Detail & Related papers (2023-04-10T17:47:39Z) - Composable Text Controls in Latent Space with ODEs [97.12426987887021]
This paper proposes a new efficient approach for composable text operations in the compact latent space of text.
By connecting pretrained LMs to the latent space through efficient adaption, we then decode the sampled vectors into desired text sequences.
Experiments show that composing those operators within our approach manages to generate or edit high-quality text.
arXiv Detail & Related papers (2022-08-01T06:51:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.