Related papers: Fairness Concerns in App Reviews: A Study on AI-based Mobile Apps

Fairness Concerns in App Reviews: A Study on AI-based Mobile Apps

URL: http://arxiv.org/abs/2401.08097v3
Date: Thu, 20 Jun 2024 14:24:06 GMT
Title: Fairness Concerns in App Reviews: A Study on AI-based Mobile Apps
Authors: Ali Rezaei Nasab, Maedeh Dashti, Mojtaba Shahin, Mansooreh Zahedi, Hourieh Khalajzadeh, Chetan Arora, Peng Liang,
Abstract summary: Unfair AI-based systems, particularly unfair AI-based mobile apps, can pose difficulties for a significant proportion of the global population. We first manually constructed a ground-truth dataset, including 1,132 fairness and 1,473 non-fairness reviews. We developed and evaluated a set of machine learning and deep learning models that distinguish fairness reviews from non-fairness reviews.
Score: 9.948068408730654
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Fairness is one of the socio-technical concerns that must be addressed in AI-based systems. Unfair AI-based systems, particularly unfair AI-based mobile apps, can pose difficulties for a significant proportion of the global population. This paper aims to analyze fairness concerns in AI-based app reviews. We first manually constructed a ground-truth dataset, including 1,132 fairness and 1,473 non-fairness reviews. Leveraging the ground-truth dataset, we developed and evaluated a set of machine learning and deep learning models that distinguish fairness reviews from non-fairness reviews. Our experiments show that our best-performing model can detect fairness reviews with a precision of 94%. We then applied the best-performing model on approximately 9.5M reviews collected from 108 AI-based apps and identified around 92K fairness reviews. Next, applying the K-means clustering technique to the 92K fairness reviews, followed by manual analysis, led to the identification of six distinct types of fairness concerns (e.g., 'receiving different quality of features and services in different platforms and devices' and 'lack of transparency and fairness in dealing with user-generated content'). Finally, the manual analysis of 2,248 app owners' responses to the fairness reviews identified six root causes (e.g., 'copyright issues') that app owners report to justify fairness concerns.

Related papers

What Users Value and Critique: Large-Scale Analysis of User Feedback on AI-Powered Mobile Apps [2.352412885878654]
We present the first comprehensive, large-scale study of user feedback on AI-powered mobile apps.<n>We leverage a curated dataset of 292 AI-driven apps across 14 categories with 894K AI-specific reviews from Google Play.<n>Our pipeline surfaces both satisfaction with one feature and frustration with another within the same review.
arXiv Detail & Related papers (2025-06-12T14:56:52Z)
Exploring Zero-Shot App Review Classification with ChatGPT: Challenges and Potential [1.1988955088595858]
This study explores the potential of zero-shot learning with ChatGPT for classifying app reviews into four categories: functional requirement, non-functional requirement, both, or neither.<n>We evaluate ChatGPT's performance on a benchmark dataset of 1,880 manually annotated reviews from ten diverse apps spanning multiple domains.
arXiv Detail & Related papers (2025-05-07T19:39:04Z)
The Leaderboard Illusion [61.27964089648608]
Arena has emerged as the go-to leaderboard for ranking the most capable AI systems.<n>We identify systematic issues that have resulted in a distorted playing field.
arXiv Detail & Related papers (2025-04-29T15:48:49Z)
Exploring Requirements Elicitation from App Store User Reviews Using Large Language Models [0.0]
This research introduces an approach leveraging the power of Large Language Models to analyze user reviews for automated requirements elicitation. We fine-tuned three well-established LLMs BERT, DistilBERT, and GEMMA, on a dataset of app reviews labeled for usefulness. Our evaluation revealed BERT's superior performance, achieving an accuracy of 92.40% and an F1-score of 92.39%, demonstrating its effectiveness in accurately classifying useful reviews.
arXiv Detail & Related papers (2024-09-23T18:57:31Z)
A Benchmark for Fairness-Aware Graph Learning [58.515305543487386]
We present an extensive benchmark on ten representative fairness-aware graph learning methods. Our in-depth analysis reveals key insights into the strengths and limitations of existing methods.
arXiv Detail & Related papers (2024-07-16T18:43:43Z)
A First Look at Fairness of Machine Learning Based Code Reviewer Recommendation [14.50773969815661]
This paper conducts the first study toward investigating the issue of fairness of ML applications in the software engineering (SE) domain. Our empirical study demonstrates that current state-of-the-art ML-based code reviewer recommendation techniques exhibit unfairness and discriminating behaviors. This paper also discusses the reasons why the studied ML-based code reviewer recommendation systems are unfair and provides solutions to mitigate the unfairness.
arXiv Detail & Related papers (2023-07-21T01:57:51Z)
Learning for Counterfactual Fairness from Observational Data [62.43249746968616]
Fairness-aware machine learning aims to eliminate biases of learning models against certain subgroups described by certain protected (sensitive) attributes such as race, gender, and age. A prerequisite for existing methods to achieve counterfactual fairness is the prior human knowledge of the causal model for the data. In this work, we address the problem of counterfactually fair prediction from observational data without given causal models by proposing a novel framework CLAIRE.
arXiv Detail & Related papers (2023-07-17T04:08:29Z)
Proactive Prioritization of App Issues via Contrastive Learning [2.6763498831034043]
We propose a new framework, PPrior, that enables proactive prioritization of app issues through identifying prominent reviews. PPrior employs a pre-trained T5 model and works in three phases. Phase one adapts the pre-trained T5 model to the user reviews data in a self-supervised fashion. Phase two, we leverage contrastive training to learn a generic and task-independent representation of user reviews.
arXiv Detail & Related papers (2023-03-12T06:23:10Z)
D-BIAS: A Causality-Based Human-in-the-Loop System for Tackling Algorithmic Bias [57.87117733071416]
We propose D-BIAS, a visual interactive tool that embodies human-in-the-loop AI approach for auditing and mitigating social biases. A user can detect the presence of bias against a group by identifying unfair causal relationships in the causal network. For each interaction, say weakening/deleting a biased causal edge, the system uses a novel method to simulate a new (debiased) dataset.
arXiv Detail & Related papers (2022-08-10T03:41:48Z)
Towards a Fair Comparison and Realistic Design and Evaluation Framework of Android Malware Detectors [63.75363908696257]
We analyze 10 influential research works on Android malware detection using a common evaluation framework. We identify five factors that, if not taken into account when creating datasets and designing detectors, significantly affect the trained ML models. We conclude that the studied ML-based detectors have been evaluated optimistically, which justifies the good published results.
arXiv Detail & Related papers (2022-05-25T08:28:08Z)
Erasing Labor with Labor: Dark Patterns and Lockstep Behaviors on Google Play [13.658284581863839]
Google Play's policy forbids the use of incentivized installs, ratings, and reviews to manipulate the placement of apps. We examine install-incentivizing apps through a socio-technical lens and perform a mixed-methods analysis of their reviews and permissions. Our dataset contains 319K reviews collected daily over five months from 60 such apps that cumulatively account for over 160.5M installs. We find evidence of fraudulent reviews on install-incentivizing apps, following which we model them as an edge stream in a dynamic bipartite graph of apps and reviewers.
arXiv Detail & Related papers (2022-02-09T16:54:27Z)
DAPPER: Label-Free Performance Estimation after Personalization for Heterogeneous Mobile Sensing [95.18236298557721]
We present DAPPER (Domain AdaPtation Performance EstimatoR) that estimates the adaptation performance in a target domain with unlabeled target data. Our evaluation with four real-world sensing datasets compared against six baselines shows that DAPPER outperforms the state-of-the-art baseline by 39.8% in estimation accuracy.
arXiv Detail & Related papers (2021-11-22T08:49:33Z)
Emerging App Issue Identification via Online Joint Sentiment-Topic Tracing [66.57888248681303]
We propose a novel emerging issue detection approach named MERIT. Based on the AOBST model, we infer the topics negatively reflected in user reviews for one app version. Experiments on popular apps from Google Play and Apple's App Store demonstrate the effectiveness of MERIT.
arXiv Detail & Related papers (2020-08-23T06:34:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.