Related papers: Understanding the Challenges and Promises of Developing Generative AI Apps: An Empirical Study

Understanding the Challenges and Promises of Developing Generative AI Apps: An Empirical Study

URL: http://arxiv.org/abs/2506.16453v2
Date: Sat, 28 Jun 2025 04:39:35 GMT
Title: Understanding the Challenges and Promises of Developing Generative AI Apps: An Empirical Study
Authors: Buthayna AlMulla, Maram Assi, Safwat Hassan,
Abstract summary: ChatGPT in 2022 triggered a rapid surge in generative artificial intelligence mobile apps (i.e., Gen-AI apps)<n>We conduct a user-centered analysis of 676,066 reviews from 173 Gen-AI apps on the Google Play Store.
Score: 0.1433758865948252
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The release of ChatGPT in 2022 triggered a rapid surge in generative artificial intelligence mobile apps (i.e., Gen-AI apps). Despite widespread adoption, little is known about how end users perceive and evaluate these Gen-AI functionalities in practice. In this work, we conduct a user-centered analysis of 676,066 reviews from 173 Gen-AI apps on the Google Play Store. We introduce a four-phase methodology, SARA (Selection, Acquisition, Refinement, and Analysis), that enables the systematic extraction of user insights using prompt-based LLM techniques. First, we demonstrate the reliability of LLMs in topic extraction, achieving 91% accuracy through five-shot prompting and non-informative review filtering. Then, we apply this method to the informative reviews, identify the top 10 user-discussed topics (e.g., AI Performance, Content Quality, and Content Policy & Censorship) and analyze the key challenges and emerging opportunities. Finally, we examine how these topics evolve over time, offering insight into shifting user expectations and engagement patterns with Gen-AI apps. Based on our findings and observations, we present actionable implications for developers and researchers.

Related papers

What Users Value and Critique: Large-Scale Analysis of User Feedback on AI-Powered Mobile Apps [2.352412885878654]
We present the first comprehensive, large-scale study of user feedback on AI-powered mobile apps.<n>We leverage a curated dataset of 292 AI-driven apps across 14 categories with 894K AI-specific reviews from Google Play.<n>Our pipeline surfaces both satisfaction with one feature and frustration with another within the same review.
arXiv Detail & Related papers (2025-06-12T14:56:52Z)
The AI Imperative: Scaling High-Quality Peer Review in Machine Learning [49.87236114682497]
We argue that AI-assisted peer review must become an urgent research and infrastructure priority.<n>We propose specific roles for AI in enhancing factual verification, guiding reviewer performance, assisting authors in quality improvement, and supporting ACs in decision-making.
arXiv Detail & Related papers (2025-06-09T18:37:14Z)
A Case Study Investigating the Role of Generative AI in Quality Evaluations of Epics in Agile Software Development [7.239833814703049]
We investigate opportunities for large language models to evaluate agile epic quality in a global company.<n>High levels of satisfaction indicate that agile epics are a new, viable application of AI evaluations.
arXiv Detail & Related papers (2025-05-12T15:31:16Z)
Analyzing User Perceptions of Large Language Models (LLMs) on Reddit: Sentiment and Topic Modeling of ChatGPT and DeepSeek Discussions [0.0]
This study aims at analyzing Reddit discussions about ChatGPT and DeepSeek using sentiment and topic modeling.<n>Report mentions whether users have faith in the technology and what they see as its future.
arXiv Detail & Related papers (2025-02-22T17:00:42Z)
How Effectively Do LLMs Extract Feature-Sentiment Pairs from App Reviews? [2.218667838700643]
This study compares the performance of state-of-the-art LLMs, including GPT-4, ChatGPT, and different variants of Llama-2 chat.<n>For predicting positive and neutral sentiments, GPT-4 achieves f1-scores of 76% and 45% in the zero-shot setting.
arXiv Detail & Related papers (2024-09-11T10:21:13Z)
Voices from the Frontier: A Comprehensive Analysis of the OpenAI Developer Forum [5.667013605202579]
OpenAI's advanced large language models (LLMs) have revolutionized natural language processing and enabled developers to create innovative applications. This paper presents a comprehensive analysis of the OpenAI Developer Forum. We focus on (1) popularity trends and user engagement patterns, and (2) a taxonomy of challenges and concerns faced by developers.
arXiv Detail & Related papers (2024-08-03T06:57:43Z)
UltraFeedback: Boosting Language Models with Scaled AI Feedback [99.4633351133207]
We present textscUltraFeedback, a large-scale, high-quality, and diversified AI feedback dataset. Our work validates the effectiveness of scaled AI feedback data in constructing strong open-source chat language models.
arXiv Detail & Related papers (2023-10-02T17:40:01Z)
A Complete Survey on Generative AI (AIGC): Is ChatGPT from GPT-4 to GPT-5 All You Need? [112.12974778019304]
generative AI (AIGC, a.k.a AI-generated content) has made headlines everywhere because of its ability to analyze and create text, images, and beyond. In the era of AI transitioning from pure analysis to creation, it is worth noting that ChatGPT, with its most recent language model GPT-4, is just a tool out of numerous AIGC tasks. This work focuses on the technological development of various AIGC tasks based on their output type, including text, images, videos, 3D content, etc.
arXiv Detail & Related papers (2023-03-21T10:09:47Z)
The Role of AI in Drug Discovery: Challenges, Opportunities, and Strategies [97.5153823429076]
The benefits, challenges and drawbacks of AI in this field are reviewed. The use of data augmentation, explainable AI, and the integration of AI with traditional experimental methods are also discussed.
arXiv Detail & Related papers (2022-12-08T23:23:39Z)
Sentiment Analysis of Users' Reviews on COVID-19 Contact Tracing Apps with a Benchmark Dataset [6.592595861973966]
Contact tracing has been globally adopted in the fight to control the infection rate of COVID-19. Thanks to digital technologies, such as smartphones and wearable devices, contacts of COVID-19 patients can be easily traced and informed about their potential exposure to the virus. Several interesting mobile applications have been developed. However, there are ever-growing concerns over the working mechanism and performance of these applications. In this work, we propose a pipeline starting from manual annotation via a crowd-sourcing study and concluding on the development and training of AI models for automatic sentiment analysis of users' reviews.
arXiv Detail & Related papers (2021-03-01T18:43:10Z)
Emerging App Issue Identification via Online Joint Sentiment-Topic Tracing [66.57888248681303]
We propose a novel emerging issue detection approach named MERIT. Based on the AOBST model, we infer the topics negatively reflected in user reviews for one app version. Experiments on popular apps from Google Play and Apple's App Store demonstrate the effectiveness of MERIT.
arXiv Detail & Related papers (2020-08-23T06:34:05Z)
Automating App Review Response Generation [67.58267006314415]
We propose a novel approach RRGen that automatically generates review responses by learning knowledge relations between reviews and their responses. Experiments on 58 apps and 309,246 review-response pairs highlight that RRGen outperforms the baselines by at least 67.4% in terms of BLEU-4.
arXiv Detail & Related papers (2020-02-10T05:23:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.