Related papers: SAGE: A Context-Aware Approach for Mining Privacy Requirements Relevant Reviews from Mental Health Apps

SAGE: A Context-Aware Approach for Mining Privacy Requirements Relevant Reviews from Mental Health Apps

URL: http://arxiv.org/abs/2507.09051v2
Date: Sun, 20 Jul 2025 04:37:04 GMT
Title: SAGE: A Context-Aware Approach for Mining Privacy Requirements Relevant Reviews from Mental Health Apps
Authors: Aakash Sorathiya, Gouri Ginde,
Abstract summary: Mental health (MH) apps often require sensitive user data to customize services for mental wellness needs.<n>This study introduces SAGE, a context-aware approach to automatically mining privacy reviews from MH apps.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Mental health (MH) apps often require sensitive user data to customize services for mental wellness needs. However, such data collection practices in some MH apps raise significant privacy concerns for users. These concerns are often mentioned in app reviews, but other feedback categories, such as reliability and usability, tend to take precedence. This poses a significant challenge in automatically identifying privacy requirements-relevant reviews (privacy reviews) that can be utilized to extract privacy requirements and address users' privacy concerns. Thus, this study introduces SAGE, a context-aware approach to automatically mining privacy reviews from MH apps using Natural Language Inference (NLI) with MH domain-specific privacy hypotheses (provides domain-specific context awareness) and a GPT model (eliminates the need for fine-tuning). The quantitative evaluation of SAGE on a dataset of 204K app reviews achieved an F1 score of 0.85 without any fine-tuning, outperforming the fine-tuned baseline classifiers BERT and T5. Furthermore, SAGE extracted 748 privacy reviews previously overlooked by keyword-based methods, demonstrating its effectiveness through qualitative evaluation. These reviews can later be refined into actionable privacy requirement artifacts.

Related papers

SENSOR: An ML-Enhanced Online Annotation Tool to Uncover Privacy Concerns from User Reviews in Social-Media Applications [0.0]
This paper introduces SENtinel SORt (SENSOR), an automated online annotation tool designed to help developers annotate and classify user reviews.<n>16000 user reviews from seven popular social media apps on Google Play Store were analyzed.<n> GRACE demonstrated the best performance (macro F1-score: 0.9434, macro ROC-AUC: 0.9934, and accuracy: 95.10%) despite class imbalance.
arXiv Detail & Related papers (2025-07-14T14:58:04Z)
A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage [77.83757117924995]
We propose a new framework that evaluates re-identification attacks to quantify individual privacy risks upon data release.<n>Our approach shows that seemingly innocuous auxiliary information can be used to infer sensitive attributes like age or substance use history from sanitized data.
arXiv Detail & Related papers (2025-04-28T01:16:27Z)
Beyond Keywords: A Context-based Hybrid Approach to Mining Ethical Concern-related App Reviews [0.0]
App reviews related to ethical concerns generally use domain-specific language and are expressed using a more varied vocabulary. This study proposes a novel Natural Language Processing (NLI) based approach that combines Natural Language Inference (NLI) and a decoder-only (LLaMA-like) Large Language Model (LLM) to extract ethical concern-related app reviews at scale.
arXiv Detail & Related papers (2024-11-11T22:08:48Z)
PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action [54.11479432110771]
PrivacyLens is a novel framework designed to extend privacy-sensitive seeds into expressive vignettes and further into agent trajectories.<n>We instantiate PrivacyLens with a collection of privacy norms grounded in privacy literature and crowdsourced seeds.<n>State-of-the-art LMs, like GPT-4 and Llama-3-70B, leak sensitive information in 25.68% and 38.69% of cases, even when prompted with privacy-enhancing instructions.
arXiv Detail & Related papers (2024-08-29T17:58:38Z)
Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning [62.224804688233]
differential privacy (DP) offers a promising solution by ensuring models are 'almost indistinguishable' with or without any particular privacy unit. We study user-level DP motivated by applications where it necessary to ensure uniform privacy protection across users.
arXiv Detail & Related papers (2024-06-20T13:54:32Z)
Little Data, Big Impact: Privacy-Aware Visual Language Models via Minimal Tuning [16.272314073324626]
We evaluate tenofthe-art Visual As Language Models (VLMs) and identify limitations in their understanding of visual privacy.<n>To address this, we introduce two compact, high-quality benchmarks, PrivBench-H and PrivTune.<n>We obtain substantial gains on all benchmarks, surpassing GPT-4, while maintaining strong performance on other tasks.
arXiv Detail & Related papers (2024-05-27T17:59:25Z)
In Pursuit of Privacy: The Value-Centered Privacy Assistant [2.995126929132991]
We develop a prototype smartphone value-centered privacy assistant (VcPA)<n>VcPA promotes user privacy decisions based on personal values.<n>We establish proof-of-concept that a VcPA helps users make more value-centered app choices.
arXiv Detail & Related papers (2023-08-10T17:04:12Z)
ChatGPT for Us: Preserving Data Privacy in ChatGPT via Dialogue Text Ambiguation to Expand Mental Health Care Delivery [52.73936514734762]
ChatGPT has gained popularity for its ability to generate human-like dialogue. Data-sensitive domains face challenges in using ChatGPT due to privacy and data-ownership concerns. We propose a text ambiguation framework that preserves user privacy.
arXiv Detail & Related papers (2023-05-19T02:09:52Z)
A Randomized Approach for Tight Privacy Accounting [63.67296945525791]
We propose a new differential privacy paradigm called estimate-verify-release (EVR) EVR paradigm first estimates the privacy parameter of a mechanism, then verifies whether it meets this guarantee, and finally releases the query output. Our empirical evaluation shows the newly proposed EVR paradigm improves the utility-privacy tradeoff for privacy-preserving machine learning.
arXiv Detail & Related papers (2023-04-17T00:38:01Z)
Mining User Privacy Concern Topics from App Reviews [10.776958968245589]
An increasing number of users are voicing their privacy concerns through app reviews on App stores. The main challenge of effectively mining privacy concerns from user reviews lies in the fact that reviews expressing privacy concerns are overridden by a large number of reviews expressing more generic themes and noisy content. In this work, we propose a novel automated approach to overcome that challenge.
arXiv Detail & Related papers (2022-12-19T08:07:27Z)
No Free Lunch in "Privacy for Free: How does Dataset Condensation Help Privacy" [75.98836424725437]
New methods designed to preserve data privacy require careful scrutiny. Failure to preserve privacy is hard to detect, and yet can lead to catastrophic results when a system implementing a privacy-preserving'' method is attacked.
arXiv Detail & Related papers (2022-09-29T17:50:23Z)
How Much User Context Do We Need? Privacy by Design in Mental Health NLP Application [33.3172788815152]
Clinical tasks such as mental health assessment from text must take social constraints into account. We present first analysis juxtaposing user history length and differential privacy budgets and elaborate how modeling additional user context enables utility preservation.
arXiv Detail & Related papers (2022-09-05T15:41:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.