Automatically Analyzing Performance Issues in Android Apps: How Far Are We?
- URL: http://arxiv.org/abs/2407.05090v1
- Date: Sat, 6 Jul 2024 14:43:40 GMT
- Title: Automatically Analyzing Performance Issues in Android Apps: How Far Are We?
- Authors: Dianshu Liao, Shidong Pan, Siyuan Yang, Yitong Wang, Yanjie Zhao, Zhenchang Xing, Xiaoyu Sun,
- Abstract summary: Performance plays a critical role in ensuring the smooth operation of any mobile application.
Current tools have limited capabilities, covering only 17.50% of the performance issues.
Existing datasets encompass only 27.50% of the issues and are very limited in size.
- Score: 21.471170837839853
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Performance plays a critical role in ensuring the smooth operation of any mobile application, directly influencing user engagement and retention. Android applications are no exception. However, unlike functionality issues, performance issues are more challenging to discover as their root causes are sophisticated and typically emerge under specific payloads. To tackle this problem, researchers have dedicated substantial efforts to proposing automatic approaches for understanding, detecting, and resolving performance issues. Despite these endeavors, it still remains unknown what the status quo of Android performance analysis is, and whether existing approaches can indeed accurately reflect real performance issues. To fill this research gap, we conducted a systematic literature review followed by an explanatory study to explore relevant studies and real-world challenges. Our findings reveal that current tools have limited capabilities, covering only 17.50% of the performance issues. Additionally, existing datasets encompass only 27.50% of the issues and are very limited in size. We also show real-world issue patterns, underscoring the huge gap between the identified techniques and practical concerns. Furthermore, possible solutions are provided to guide future research towards achieving effective performance issue detection and resolution.
Related papers
- The Impact of LLM-Assistants on Software Developer Productivity: A Systematic Literature Review [4.503986781849658]
Large language model assistants (LLM-assistants) present new opportunities to transform software development.<n>Despite growing interest, there is no synthesis of how LLM-assistants affect software developer productivity.<n>Our analysis reveals that LLM-assistants offer both considerable benefits and critical risks.
arXiv Detail & Related papers (2025-07-03T20:25:49Z) - Benchmarking Deep Search over Heterogeneous Enterprise Data [73.55304268238474]
We present a new benchmark for evaluating a form of retrieval-augmented generation (RAG)<n>RAG requires source-aware, multi-hop reasoning over diverse, sparsed, but related sources.<n>We build it using a synthetic data pipeline that simulates business across product planning, development, and support stages.
arXiv Detail & Related papers (2025-06-29T08:34:59Z) - The Behavior Gap: Evaluating Zero-shot LLM Agents in Complex Task-Oriented Dialogs [8.581146564012856]
This study proposes a comprehensive evaluation framework to quantify the behavior gap between AI agents and human experts.<n>Our findings reveal that this behavior gap is a critical factor negatively impacting the performance of LLM agents.<n>For the most complex task in our study, even the GPT-4o-based agent exhibits low alignment with human behavior.
arXiv Detail & Related papers (2025-06-13T22:36:41Z) - ORMind: A Cognitive-Inspired End-to-End Reasoning Framework for Operations Research [53.736407871322314]
We introduce ORMind, a cognitive-inspired framework that enhances optimization through counterfactual reasoning.<n>Our approach emulates human cognition, implementing an end-to-end workflow that transforms requirements into mathematical models and executable code.<n>It is currently being tested internally in Lenovo's AI Assistant, with plans to enhance optimization capabilities for both business and consumer customers.
arXiv Detail & Related papers (2025-06-02T05:11:21Z) - Interactive Agents to Overcome Ambiguity in Software Engineering [61.40183840499932]
AI agents are increasingly being deployed to automate tasks, often based on ambiguous and underspecified user instructions.
Making unwarranted assumptions and failing to ask clarifying questions can lead to suboptimal outcomes.
We study the ability of LLM agents to handle ambiguous instructions in interactive code generation settings by evaluating proprietary and open-weight models on their performance.
arXiv Detail & Related papers (2025-02-18T17:12:26Z) - Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation [2.2241228857601727]
This paper presents an interdisciplinary meta-review of about 100 studies that discuss shortcomings in quantitative benchmarking practices.
It brings together many fine-grained issues in the design and application of benchmarks with broader sociotechnical issues.
Our review also highlights a series of systemic flaws in current practices, such as misaligned incentives, construct validity issues, unknown unknowns, and problems with the gaming of benchmark results.
arXiv Detail & Related papers (2025-02-10T15:25:06Z) - SoK: On Closing the Applicability Gap in Automated Vulnerability Detection [0.18846515534317265]
Automated Vulnerability Detection (AVD) aims to autonomously analyze source code to identify vulnerabilities.
This paper addresses two primary research questions: How is current AVD research distributed across its core components, and what key areas should future research target to bridge the gap in the practical applicability of AVD throughout software development?
We conduct a systematization over 79 AVD articles and 17 empirical studies, analyzing them across five core components: task formulation and granularity, input programming languages and representations, detection approaches and key solutions, evaluation metrics and datasets, and reported performance.
arXiv Detail & Related papers (2024-12-15T14:01:41Z) - Does the Order of Fine-tuning Matter and Why? [11.975836356680855]
We study the effect of fine-tuning multiple intermediate tasks and their ordering on target task performance.
Experimental results show that there is an impact of task ordering on target task performance by up to 6% of performance gain and up to 4% of performance loss.
arXiv Detail & Related papers (2024-10-03T19:07:14Z) - SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories [55.161075901665946]
Super aims to capture the realistic challenges faced by researchers working with Machine Learning (ML) and Natural Language Processing (NLP) research repositories.
Our benchmark comprises three distinct problem sets: 45 end-to-end problems with annotated expert solutions, 152 sub problems derived from the expert set that focus on specific challenges, and 602 automatically generated problems for larger-scale development.
We show that state-of-the-art approaches struggle to solve these problems with the best model (GPT-4o) solving only 16.3% of the end-to-end set, and 46.1% of the scenarios.
arXiv Detail & Related papers (2024-09-11T17:37:48Z) - On the Worst Prompt Performance of Large Language Models [93.13542053835542]
Performance of large language models (LLMs) is acutely sensitive to the phrasing of prompts.
We introduce RobustAlpacaEval, a new benchmark that consists of semantically equivalent case-level queries.
Experiments on RobustAlpacaEval with ChatGPT and six open-source LLMs from the Llama, Mistral, and Gemma families uncover substantial variability in model performance.
arXiv Detail & Related papers (2024-06-08T13:40:38Z) - An Empirical Study of Challenges in Machine Learning Asset Management [15.07444988262748]
Despite existing research, a significant knowledge gap remains in operational challenges like model versioning, data traceability, and collaboration.
Our study aims to address this gap by analyzing 15,065 posts from developer forums and platforms.
We uncover 133 topics related to asset management challenges, grouped into 16 macro-topics, with software dependency, model deployment, and model training being the most discussed.
arXiv Detail & Related papers (2024-02-25T05:05:52Z) - Competition-Level Problems are Effective LLM Evaluators [121.15880285283116]
This paper aims to evaluate the reasoning capacities of large language models (LLMs) in solving recent programming problems in Codeforces.
We first provide a comprehensive evaluation of GPT-4's peiceived zero-shot performance on this task, considering various aspects such as problems' release time, difficulties, and types of errors encountered.
Surprisingly, theThoughtived performance of GPT-4 has experienced a cliff like decline in problems after September 2021 consistently across all the difficulties and types of problems.
arXiv Detail & Related papers (2023-12-04T18:58:57Z) - Towards leveraging LLMs for Conditional QA [1.9649272351760063]
This study delves into the capabilities and limitations of Large Language Models (LLMs) in the challenging domain of conditional question-answering.
Our findings reveal that fine-tuned LLMs can surpass the state-of-the-art (SOTA) performance in some cases, even without fully encoding all input context.
These models encounter challenges in extractive question answering, where they lag behind the SOTA by over 10 points, and in mitigating the risk of injecting false information.
arXiv Detail & Related papers (2023-12-02T14:02:52Z) - rWISDM: Repaired WISDM, a Public Dataset for Human Activity Recognition [0.0]
Human Activity Recognition (HAR) has become a spotlight in recent scientific research because of its applications in various domains such as healthcare, athletic competitions, smart cities, and smart home.
This paper presents the methods by which other researchers may identify and correct similar problems in public datasets.
arXiv Detail & Related papers (2023-05-17T13:55:50Z) - A Comprehensive Review of Trends, Applications and Challenges In
Out-of-Distribution Detection [0.76146285961466]
Field of study has emerged, focusing on detecting out-of-distribution data subsets and enabling a more comprehensive generalization.
As many deep learning based models have achieved near-perfect results on benchmark datasets, the need to evaluate these models' reliability and trustworthiness is felt more strongly than ever.
This paper presents a survey that, in addition to reviewing more than 70 papers in this field, presents challenges and directions for future works and offers a unifying look into various types of data shifts and solutions for better generalization.
arXiv Detail & Related papers (2022-09-26T18:13:14Z) - Towards Unbiased Visual Emotion Recognition via Causal Intervention [63.74095927462]
We propose a novel Emotion Recognition Network (IERN) to alleviate the negative effects brought by the dataset bias.
A series of designed tests validate the effectiveness of IERN, and experiments on three emotion benchmarks demonstrate that IERN outperforms other state-of-the-art approaches.
arXiv Detail & Related papers (2021-07-26T10:40:59Z) - Competency Problems: On Finding and Removing Artifacts in Language Data [50.09608320112584]
We argue that for complex language understanding tasks, all simple feature correlations are spurious.
We theoretically analyze the difficulty of creating data for competency problems when human bias is taken into account.
arXiv Detail & Related papers (2021-04-17T21:34:10Z) - Affect Analysis in-the-wild: Valence-Arousal, Expressions, Action Units
and a Unified Framework [83.21732533130846]
The paper focuses on large in-the-wild databases, i.e., Aff-Wild and Aff-Wild2.
It presents the design of two classes of deep neural networks trained with these databases.
A novel multi-task and holistic framework is presented which is able to jointly learn and effectively generalize and perform affect recognition.
arXiv Detail & Related papers (2021-03-29T17:36:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.