Related papers: Mining User Privacy Concern Topics from App Reviews

Mining User Privacy Concern Topics from App Reviews

URL: http://arxiv.org/abs/2212.09289v4
Date: Wed, 11 Oct 2023 07:48:34 GMT
Title: Mining User Privacy Concern Topics from App Reviews
Authors: Jianzhang Zhang, Jinping Hua, Yiyang Chen, Nan Niu, Chuang Liu
Abstract summary: An increasing number of users are voicing their privacy concerns through app reviews on App stores. The main challenge of effectively mining privacy concerns from user reviews lies in the fact that reviews expressing privacy concerns are overridden by a large number of reviews expressing more generic themes and noisy content. In this work, we propose a novel automated approach to overcome that challenge.
Score: 10.776958968245589
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Context: As mobile applications (Apps) widely spread over our society and life, various personal information is constantly demanded by Apps in exchange for more intelligent and customized functionality. An increasing number of users are voicing their privacy concerns through app reviews on App stores. Objective: The main challenge of effectively mining privacy concerns from user reviews lies in the fact that reviews expressing privacy concerns are overridden by a large number of reviews expressing more generic themes and noisy content. In this work, we propose a novel automated approach to overcome that challenge. Method: Our approach first employs information retrieval and document embeddings to unsupervisedly extract candidate privacy reviews that are further labeled to prepare the annotation dataset. Then, supervised classifiers are trained to automatically identify privacy reviews. Finally, we design an interpretable topic mining algorithm to detect privacy concern topics contained in the privacy reviews. Results: Experimental results show that the best performed document embedding achieves an average precision of 96.80% in the top 100 retrieved candidate privacy reviews. All of the trained privacy review classifiers can achieve an F1 value of more than 91%, outperforming the recent keywords matching baseline with the maximum F1 margin being 7.5%. For detecting privacy concern topics from privacy reviews, our proposed algorithm achieves both better topic coherence and diversity than three strong topic modeling baselines including LDA. Conclusion: Empirical evaluation results demonstrate the effectiveness of our approach in identifying privacy reviews and detecting user privacy concerns expressed in App reviews.

Related papers

SENSOR: An ML-Enhanced Online Annotation Tool to Uncover Privacy Concerns from User Reviews in Social-Media Applications [0.0]
This paper introduces SENtinel SORt (SENSOR), an automated online annotation tool designed to help developers annotate and classify user reviews.<n>16000 user reviews from seven popular social media apps on Google Play Store were analyzed.<n> GRACE demonstrated the best performance (macro F1-score: 0.9434, macro ROC-AUC: 0.9934, and accuracy: 95.10%) despite class imbalance.
arXiv Detail & Related papers (2025-07-14T14:58:04Z)
SAGE: A Context-Aware Approach for Mining Privacy Requirements Relevant Reviews from Mental Health Apps [0.0]
Mental health (MH) apps often require sensitive user data to customize services for mental wellness needs.<n>This study introduces SAGE, a context-aware approach to automatically mining privacy reviews from MH apps.
arXiv Detail & Related papers (2025-07-11T21:53:56Z)
Privacy Bills of Materials: A Transparent Privacy Information Inventory for Collaborative Privacy Notice Generation in Mobile App Development [23.41168782020005]
We introduce PriBOM, a systematic software engineering approach to better capture and coordinate mobile app privacy information. PriBOM facilitates transparency-centric privacy documentation and specific privacy notice creation, enabling traceability and trackability of privacy practices.
arXiv Detail & Related papers (2025-01-02T08:14:52Z)
Multi-P$^2$A: A Multi-perspective Benchmark on Privacy Assessment for Large Vision-Language Models [65.2761254581209]
We evaluate the privacy preservation capabilities of 21 open-source and 2 closed-source Large Vision-Language Models (LVLMs) Based on Multi-P$2$A, we evaluate the privacy preservation capabilities of 21 open-source and 2 closed-source LVLMs. Our results reveal that current LVLMs generally pose a high risk of facilitating privacy breaches.
arXiv Detail & Related papers (2024-12-27T07:33:39Z)
On the Differential Privacy and Interactivity of Privacy Sandbox Reports [78.85958224681858]
The Privacy Sandbox initiative from Google includes APIs for enabling privacy-preserving advertising functionalities. We provide an abstract model for analyzing the privacy of these APIs and show that they satisfy a formal DP guarantee.
arXiv Detail & Related papers (2024-12-22T08:22:57Z)
Beyond Keywords: A Context-based Hybrid Approach to Mining Ethical Concern-related App Reviews [0.0]
App reviews related to ethical concerns generally use domain-specific language and are expressed using a more varied vocabulary. This study proposes a novel Natural Language Processing (NLI) based approach that combines Natural Language Inference (NLI) and a decoder-only (LLaMA-like) Large Language Model (LLM) to extract ethical concern-related app reviews at scale.
arXiv Detail & Related papers (2024-11-11T22:08:48Z)
PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action [54.11479432110771]
PrivacyLens is a novel framework designed to extend privacy-sensitive seeds into expressive vignettes and further into agent trajectories. We instantiate PrivacyLens with a collection of privacy norms grounded in privacy literature and crowdsourced seeds. State-of-the-art LMs, like GPT-4 and Llama-3-70B, leak sensitive information in 25.68% and 38.69% of cases, even when prompted with privacy-enhancing instructions.
arXiv Detail & Related papers (2024-08-29T17:58:38Z)
Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning [62.224804688233]
differential privacy (DP) offers a promising solution by ensuring models are 'almost indistinguishable' with or without any particular privacy unit. We study user-level DP motivated by applications where it necessary to ensure uniform privacy protection across users.
arXiv Detail & Related papers (2024-06-20T13:54:32Z)
A Decade of Privacy-Relevant Android App Reviews: Large Scale Trends [20.714617724462393]
We examine what users have been writing about privacy along multiple dimensions: time, countries, app types, diverse privacy topics, and even across a spectrum of emotions. We find that although privacy reviews come from more than 200 countries, 33 countries provide 90% of privacy reviews. Surprisingly, we uncover that it is not uncommon for reviews that discuss privacy to be positive (32%); many users express pleasure about privacy features within apps or privacy-focused apps.
arXiv Detail & Related papers (2024-03-04T18:21:56Z)
Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory [82.7042006247124]
We show that even the most capable AI models reveal private information in contexts that humans would not, 39% and 57% of the time, respectively. Our work underscores the immediate need to explore novel inference-time privacy-preserving approaches, based on reasoning and theory of mind.
arXiv Detail & Related papers (2023-10-27T04:15:30Z)
In Pursuit of Privacy: The Value-Centered Privacy Assistant [2.995126929132991]
We develop a prototype smartphone value-centered privacy assistant (VcPA) VcPA promotes user privacy decisions based on personal values. We establish proof-of-concept that a VcPA helps users make more value-centered app choices.
arXiv Detail & Related papers (2023-08-10T17:04:12Z)
Systematic Review on Privacy Categorization [1.5377372227901214]
This work aims to present a systematic review of the literature on privacy categorization. Privacy categorization involves the possibility to classify users according to specific prerequisites.
arXiv Detail & Related papers (2023-07-07T15:18:26Z)
SeePrivacy: Automated Contextual Privacy Policy Generation for Mobile Applications [21.186902172367173]
SeePrivacy is designed to automatically generate contextual privacy policies for mobile apps. Our method synergistically combines mobile GUI understanding and privacy policy document analysis. 96% of the retrieved policy segments can be correctly matched with their contexts.
arXiv Detail & Related papers (2023-07-04T12:52:45Z)
ATLAS: Automatically Detecting Discrepancies Between Privacy Policies and Privacy Labels [2.457872341625575]
We introduce the Automated Privacy Label Analysis System (ATLAS) ATLAS identifies possible discrepancies between mobile app privacy policies and their privacy labels. We find that, on average, apps have 5.32 such potential compliance issues.
arXiv Detail & Related papers (2023-05-24T05:27:22Z)
PLUE: Language Understanding Evaluation Benchmark for Privacy Policies in English [77.79102359580702]
We introduce the Privacy Policy Language Understanding Evaluation benchmark, a multi-task benchmark for evaluating the privacy policy language understanding. We also collect a large corpus of privacy policies to enable privacy policy domain-specific language model pre-training. We demonstrate that domain-specific continual pre-training offers performance improvements across all tasks.
arXiv Detail & Related papers (2022-12-20T05:58:32Z)
Privacy Explanations - A Means to End-User Trust [64.7066037969487]
We looked into how explainability might help to tackle this problem. We created privacy explanations that aim to help to clarify to end users why and for what purposes specific data is required. Our findings reveal that privacy explanations can be an important step towards increasing trust in software systems.
arXiv Detail & Related papers (2022-10-18T09:30:37Z)
Algorithms with More Granular Differential Privacy Guarantees [65.3684804101664]
We consider partial differential privacy (DP), which allows quantifying the privacy guarantee on a per-attribute basis. In this work, we study several basic data analysis and learning tasks, and design algorithms whose per-attribute privacy parameter is smaller that the best possible privacy parameter for the entire record of a person.
arXiv Detail & Related papers (2022-09-08T22:43:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.