Related papers: Automatically Analyzing Performance Issues in Android Apps: How Far Are We?

Automatically Analyzing Performance Issues in Android Apps: How Far Are We?

URL: http://arxiv.org/abs/2407.05090v2
Date: Sat, 2 Nov 2024 12:46:53 GMT
Title: Automatically Analyzing Performance Issues in Android Apps: How Far Are We?
Authors: Dianshu Liao, Shidong Pan, Siyuan Yang, Yanjie Zhao, Zhenchang Xing, Xiaoyu Sun,
Abstract summary: We conduct a large-scale comparative study of Android performance issues in real-world applications and literature. Our results show a substantial divergence exists in the primary performance concerns of researchers, developers, and users. It is crucial for our community to intensify efforts to bridge these gaps and achieve comprehensive detection and resolution of performance issues.
Score: 15.614257662319863
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Performance issues in Android applications significantly undermine users' experience, engagement, and retention, which is a long-lasting research topic in academia. Unlike functionality issues, performance issues are more difficult to diagnose and resolve due to their complex root causes, which often emerge only under specific conditions or payloads. Although many efforts haven attempt to mitigate the impact of performance issues by developing methods to automatically identify and resolve them, it remains unclear if this objective has been fulfilled, and the existing approaches indeed targeted on the most critical performance issues encountered in real-world settings. To this end, we conducted a large-scale comparative study of Android performance issues in real-world applications and literature. Specifically, we started by investigating real-world performance issues, their underlying root causes (i.e., contributing factors), and common code patterns. We then took an additional step to empirically summarize existing approaches and datasets through a literature review, assessing how well academic research reflects the real-world challenges faced by developers and users. Our comparison results show a substantial divergence exists in the primary performance concerns of researchers, developers, and users. Among all the identified factors, 57.14% have not been examined in academic research, while a substantial 76.39% remain unaddressed by existing tools, and 66.67% lack corresponding datasets. This stark contrast underscores a substantial gap in our understanding and management of performance issues. Consequently, it is crucial for our community to intensify efforts to bridge these gaps and achieve comprehensive detection and resolution of performance issues.

Related papers

Interactive Agents to Overcome Ambiguity in Software Engineering [61.40183840499932]
AI agents are increasingly being deployed to automate tasks, often based on ambiguous and underspecified user instructions. Making unwarranted assumptions and failing to ask clarifying questions can lead to suboptimal outcomes. We study the ability of LLM agents to handle ambiguous instructions in interactive code generation settings by evaluating proprietary and open-weight models on their performance.
arXiv Detail & Related papers (2025-02-18T17:12:26Z)
Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation [2.2241228857601727]
This paper presents an interdisciplinary meta-review of about 100 studies that discuss shortcomings in quantitative benchmarking practices. It brings together many fine-grained issues in the design and application of benchmarks with broader sociotechnical issues. Our review also highlights a series of systemic flaws in current practices, such as misaligned incentives, construct validity issues, unknown unknowns, and problems with the gaming of benchmark results.
arXiv Detail & Related papers (2025-02-10T15:25:06Z)
SoK: On Closing the Applicability Gap in Automated Vulnerability Detection [0.18846515534317265]
Automated Vulnerability Detection (AVD) aims to autonomously analyze source code to identify vulnerabilities. This paper addresses two primary research questions: How is current AVD research distributed across its core components, and what key areas should future research target to bridge the gap in the practical applicability of AVD throughout software development? We conduct a systematization over 79 AVD articles and 17 empirical studies, analyzing them across five core components: task formulation and granularity, input programming languages and representations, detection approaches and key solutions, evaluation metrics and datasets, and reported performance.
arXiv Detail & Related papers (2024-12-15T14:01:41Z)
Does the Order of Fine-tuning Matter and Why? [11.975836356680855]
We study the effect of fine-tuning multiple intermediate tasks and their ordering on target task performance. Experimental results show that there is an impact of task ordering on target task performance by up to 6% of performance gain and up to 4% of performance loss.
arXiv Detail & Related papers (2024-10-03T19:07:14Z)
SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories [55.161075901665946]
Super aims to capture the realistic challenges faced by researchers working with Machine Learning (ML) and Natural Language Processing (NLP) research repositories. Our benchmark comprises three distinct problem sets: 45 end-to-end problems with annotated expert solutions, 152 sub problems derived from the expert set that focus on specific challenges, and 602 automatically generated problems for larger-scale development. We show that state-of-the-art approaches struggle to solve these problems with the best model (GPT-4o) solving only 16.3% of the end-to-end set, and 46.1% of the scenarios.
arXiv Detail & Related papers (2024-09-11T17:37:48Z)
On the Worst Prompt Performance of Large Language Models [93.13542053835542]
Performance of large language models (LLMs) is acutely sensitive to the phrasing of prompts. We introduce RobustAlpacaEval, a new benchmark that consists of semantically equivalent case-level queries. Experiments on RobustAlpacaEval with ChatGPT and six open-source LLMs from the Llama, Mistral, and Gemma families uncover substantial variability in model performance.
arXiv Detail & Related papers (2024-06-08T13:40:38Z)
An Empirical Study of Challenges in Machine Learning Asset Management [15.07444988262748]
Despite existing research, a significant knowledge gap remains in operational challenges like model versioning, data traceability, and collaboration. Our study aims to address this gap by analyzing 15,065 posts from developer forums and platforms. We uncover 133 topics related to asset management challenges, grouped into 16 macro-topics, with software dependency, model deployment, and model training being the most discussed.
arXiv Detail & Related papers (2024-02-25T05:05:52Z)
Competition-Level Problems are Effective LLM Evaluators [121.15880285283116]
This paper aims to evaluate the reasoning capacities of large language models (LLMs) in solving recent programming problems in Codeforces. We first provide a comprehensive evaluation of GPT-4's peiceived zero-shot performance on this task, considering various aspects such as problems' release time, difficulties, and types of errors encountered. Surprisingly, theThoughtived performance of GPT-4 has experienced a cliff like decline in problems after September 2021 consistently across all the difficulties and types of problems.
arXiv Detail & Related papers (2023-12-04T18:58:57Z)
Towards leveraging LLMs for Conditional QA [1.9649272351760063]
This study delves into the capabilities and limitations of Large Language Models (LLMs) in the challenging domain of conditional question-answering. Our findings reveal that fine-tuned LLMs can surpass the state-of-the-art (SOTA) performance in some cases, even without fully encoding all input context. These models encounter challenges in extractive question answering, where they lag behind the SOTA by over 10 points, and in mitigating the risk of injecting false information.
arXiv Detail & Related papers (2023-12-02T14:02:52Z)
rWISDM: Repaired WISDM, a Public Dataset for Human Activity Recognition [0.0]
Human Activity Recognition (HAR) has become a spotlight in recent scientific research because of its applications in various domains such as healthcare, athletic competitions, smart cities, and smart home. This paper presents the methods by which other researchers may identify and correct similar problems in public datasets.
arXiv Detail & Related papers (2023-05-17T13:55:50Z)
A Comprehensive Review of Trends, Applications and Challenges In Out-of-Distribution Detection [0.76146285961466]
Field of study has emerged, focusing on detecting out-of-distribution data subsets and enabling a more comprehensive generalization. As many deep learning based models have achieved near-perfect results on benchmark datasets, the need to evaluate these models' reliability and trustworthiness is felt more strongly than ever. This paper presents a survey that, in addition to reviewing more than 70 papers in this field, presents challenges and directions for future works and offers a unifying look into various types of data shifts and solutions for better generalization.
arXiv Detail & Related papers (2022-09-26T18:13:14Z)
Towards Unbiased Visual Emotion Recognition via Causal Intervention [63.74095927462]
We propose a novel Emotion Recognition Network (IERN) to alleviate the negative effects brought by the dataset bias. A series of designed tests validate the effectiveness of IERN, and experiments on three emotion benchmarks demonstrate that IERN outperforms other state-of-the-art approaches.
arXiv Detail & Related papers (2021-07-26T10:40:59Z)
Competency Problems: On Finding and Removing Artifacts in Language Data [50.09608320112584]
We argue that for complex language understanding tasks, all simple feature correlations are spurious. We theoretically analyze the difficulty of creating data for competency problems when human bias is taken into account.
arXiv Detail & Related papers (2021-04-17T21:34:10Z)
Affect Analysis in-the-wild: Valence-Arousal, Expressions, Action Units and a Unified Framework [83.21732533130846]
The paper focuses on large in-the-wild databases, i.e., Aff-Wild and Aff-Wild2. It presents the design of two classes of deep neural networks trained with these databases. A novel multi-task and holistic framework is presented which is able to jointly learn and effectively generalize and perform affect recognition.
arXiv Detail & Related papers (2021-03-29T17:36:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.