Related papers: SENSOR: An ML-Enhanced Online Annotation Tool to Uncover Privacy Concerns from User Reviews in Social-Media Applications

SENSOR: An ML-Enhanced Online Annotation Tool to Uncover Privacy Concerns from User Reviews in Social-Media Applications

URL: http://arxiv.org/abs/2507.10640v1
Date: Mon, 14 Jul 2025 14:58:04 GMT
Title: SENSOR: An ML-Enhanced Online Annotation Tool to Uncover Privacy Concerns from User Reviews in Social-Media Applications
Authors: Labiba Farah, Mohammad Ridwan Kabir, Shohel Ahmed, MD Mohaymen Ul Anam, Md. Sakibul Islam,
Abstract summary: This paper introduces SENtinel SORt (SENSOR), an automated online annotation tool designed to help developers annotate and classify user reviews.<n>16000 user reviews from seven popular social media apps on Google Play Store were analyzed.<n> GRACE demonstrated the best performance (macro F1-score: 0.9434, macro ROC-AUC: 0.9934, and accuracy: 95.10%) despite class imbalance.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The widespread use of social media applications has raised significant privacy concerns, often highlighted in user reviews. These reviews also provide developers with valuable insights into improving apps by addressing issues and introducing better features. However, the sheer volume and nuanced nature of reviews make manual identification and prioritization of privacy-related concerns challenging for developers. Previous studies have developed software utilities to automatically classify user reviews as privacy-relevant, privacy-irrelevant, bug reports, feature requests, etc., using machine learning. Notably, there is a lack of focus on classifying reviews specifically as privacy-related feature requests, privacy-related bug reports, or privacy-irrelevant. This paper introduces SENtinel SORt (SENSOR), an automated online annotation tool designed to help developers annotate and classify user reviews into these categories. For automating the annotation of such reviews, this paper introduces the annotation model, GRACE (GRU-based Attention with CBOW Embedding), using Gated Recurrent Units (GRU) with Continuous Bag of Words (CBOW) and Attention mechanism. Approximately 16000 user reviews from seven popular social media apps on Google Play Store, including Instagram, Facebook, WhatsApp, Snapchat, X (formerly Twitter), Facebook Lite, and Line were analyzed. Two annotators manually labelled the reviews, achieving a Cohen's Kappa value of 0.87, ensuring a labeled dataset with high inter-rater agreement for training machine learning models. Among the models tested, GRACE demonstrated the best performance (macro F1-score: 0.9434, macro ROC-AUC: 0.9934, and accuracy: 95.10%) despite class imbalance. SENSOR demonstrates significant potential to assist developers with extracting and addressing privacy-related feature requests or bug reports from user reviews, enhancing user privacy and trust.

Related papers

SAGE: A Context-Aware Approach for Mining Privacy Requirements Relevant Reviews from Mental Health Apps [0.0]
Mental health (MH) apps often require sensitive user data to customize services for mental wellness needs.<n>This study introduces SAGE, a context-aware approach to automatically mining privacy reviews from MH apps.
arXiv Detail & Related papers (2025-07-11T21:53:56Z)
LazyReview A Dataset for Uncovering Lazy Thinking in NLP Peer Reviews [74.87393214734114]
This work introduces LazyReview, a dataset of peer-review sentences annotated with fine-grained lazy thinking categories.<n>Large Language Models (LLMs) struggle to detect these instances in a zero-shot setting.<n> instruction-based fine-tuning on our dataset significantly boosts performance by 10-20 performance points.
arXiv Detail & Related papers (2025-04-15T10:07:33Z)
Can LLM feedback enhance review quality? A randomized study of 20K reviews at ICLR 2025 [115.86204862475864]
Review Feedback Agent provides automated feedback on vague comments, content misunderstandings, and unprofessional remarks to reviewers.<n>It was implemented at ICLR 2025 as a large randomized control study.<n> 27% of reviewers who received feedback updated their reviews, and over 12,000 feedback suggestions from the agent were incorporated by those reviewers.
arXiv Detail & Related papers (2025-04-13T22:01:25Z)
CRScore: Grounding Automated Evaluation of Code Review Comments in Code Claims and Smells [15.66562304661042]
CRScore is a reference-free metric to measure dimensions of review quality like conciseness, comprehensiveness, and relevance.<n>We demonstrate that CRScore can produce valid, fine-grained scores of review quality that have the greatest alignment with human judgment among open source metrics.<n>We also release a corpus of 2.9k human-annotated review quality scores for machine-generated and GitHub review comments to support the development of automated metrics.
arXiv Detail & Related papers (2024-09-29T21:53:18Z)
Exploring User Privacy Awareness on GitHub: An Empirical Study [5.822284390235265]
GitHub provides developers with a practical way to distribute source code and collaborate on common projects. To enhance account security and privacy, GitHub allows its users to manage access permissions, review audit logs, and enable two-factor authentication. Despite the endless effort, the platform still faces various issues related to the privacy of its users.
arXiv Detail & Related papers (2024-09-06T06:41:46Z)
Combat AI With AI: Counteract Machine-Generated Fake Restaurant Reviews on Social Media [77.34726150561087]
We propose to leverage the high-quality elite Yelp reviews to generate fake reviews from the OpenAI GPT review creator. We apply the model to predict non-elite reviews and identify the patterns across several dimensions. We show that social media platforms are continuously challenged by machine-generated fake reviews.
arXiv Detail & Related papers (2023-02-10T19:40:10Z)
Mining User Privacy Concern Topics from App Reviews [10.776958968245589]
An increasing number of users are voicing their privacy concerns through app reviews on App stores. The main challenge of effectively mining privacy concerns from user reviews lies in the fact that reviews expressing privacy concerns are overridden by a large number of reviews expressing more generic themes and noisy content. In this work, we propose a novel automated approach to overcome that challenge.
arXiv Detail & Related papers (2022-12-19T08:07:27Z)
Analysis of Longitudinal Changes in Privacy Behavior of Android Applications [79.71330613821037]
In this paper, we examine the trends in how Android apps have changed over time with respect to privacy. We examine the adoption of HTTPS, whether apps scan the device for other installed apps, the use of permissions for privacy-sensitive data, and the use of unique identifiers. We find that privacy-related behavior has improved with time as apps continue to receive updates, and that the third-party libraries used by apps are responsible for more issues with privacy.
arXiv Detail & Related papers (2021-12-28T16:21:31Z)
TOUR: Dynamic Topic and Sentiment Analysis of User Reviews for Assisting App Release [34.529117157417176]
TOUR is able to (i) detect and summarize emerging app issues over app versions, (ii) identify user sentiment towards app features, and (iii) prioritize important user reviews for facilitating developers' examination.
arXiv Detail & Related papers (2021-03-26T08:44:55Z)
Emerging App Issue Identification via Online Joint Sentiment-Topic Tracing [66.57888248681303]
We propose a novel emerging issue detection approach named MERIT. Based on the AOBST model, we infer the topics negatively reflected in user reviews for one app version. Experiments on popular apps from Google Play and Apple's App Store demonstrate the effectiveness of MERIT.
arXiv Detail & Related papers (2020-08-23T06:34:05Z)
Automating App Review Response Generation [67.58267006314415]
We propose a novel approach RRGen that automatically generates review responses by learning knowledge relations between reviews and their responses. Experiments on 58 apps and 309,246 review-response pairs highlight that RRGen outperforms the baselines by at least 67.4% in terms of BLEU-4.
arXiv Detail & Related papers (2020-02-10T05:23:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.