Related papers: A Fine-grained Chinese Software Privacy Policy Dataset for Sequence Labeling and Regulation Compliant Identification

A Fine-grained Chinese Software Privacy Policy Dataset for Sequence Labeling and Regulation Compliant Identification

URL: http://arxiv.org/abs/2212.04357v1
Date: Sun, 4 Dec 2022 05:59:59 GMT
Title: A Fine-grained Chinese Software Privacy Policy Dataset for Sequence Labeling and Regulation Compliant Identification
Authors: Kaifa Zhao, Le Yu, Shiyao Zhou, Jing Li, Xiapu Luo, Yat Fei Aemon Chiu, Yutong Liu
Abstract summary: We construct the first Chinese privacy policy dataset, CA4P-483, to facilitate the sequence labeling tasks and regulation compliance identification. Our dataset includes 483 Chinese Android application privacy policies, over 11K sentences, and 52K fine-grained annotations.
Score: 23.14031861460124
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Privacy protection raises great attention on both legal levels and user awareness. To protect user privacy, countries enact laws and regulations requiring software privacy policies to regulate their behavior. However, privacy policies are written in natural languages with many legal terms and software jargon that prevent users from understanding and even reading them. It is desirable to use NLP techniques to analyze privacy policies for helping users understand them. Furthermore, existing datasets ignore law requirements and are limited to English. In this paper, we construct the first Chinese privacy policy dataset, namely CA4P-483, to facilitate the sequence labeling tasks and regulation compliance identification between privacy policies and software. Our dataset includes 483 Chinese Android application privacy policies, over 11K sentences, and 52K fine-grained annotations. We evaluate families of robust and representative baseline models on our dataset. Based on baseline performance, we provide findings and potential research directions on our dataset. Finally, we investigate the potential applications of CA4P-483 combing regulation requirements and program analysis.

Related papers

Controlling What You Share: Assessing Language Model Adherence to Privacy Preferences [80.63946798650653]
We explore how users can stay in control of their data by using privacy profiles.<n>We build a framework where a local model uses these instructions to rewrite queries.<n>To support this research, we introduce a multilingual dataset of real user queries to mark private content.
arXiv Detail & Related papers (2025-07-07T18:22:55Z)
Interactive GDPR-Compliant Privacy Policy Generation for Software Applications [6.189770781546807]
To use software applications users are sometimes requested to provide their personal information. As privacy has become a significant concern many protection regulations exist worldwide. We propose an approach that generates comprehensive and compliant privacy policy.
arXiv Detail & Related papers (2024-10-04T01:22:16Z)
How Privacy-Savvy Are Large Language Models? A Case Study on Compliance and Privacy Technical Review [15.15468770348023]
We evaluate large language models' performance in privacy-related tasks such as privacy information extraction (PIE), legal and regulatory key point detection (KPD), and question answering (QA) Through an empirical assessment, we investigate the capacity of several prominent LLMs, including BERT, GPT-3.5, GPT-4, and custom models, in executing privacy compliance checks and technical privacy reviews. While LLMs show promise in automating privacy reviews and identifying regulatory discrepancies, significant gaps persist in their ability to fully comply with evolving legal standards.
arXiv Detail & Related papers (2024-09-04T01:51:37Z)
PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action [54.11479432110771]
PrivacyLens is a novel framework designed to extend privacy-sensitive seeds into expressive vignettes and further into agent trajectories. We instantiate PrivacyLens with a collection of privacy norms grounded in privacy literature and crowdsourced seeds. State-of-the-art LMs, like GPT-4 and Llama-3-70B, leak sensitive information in 25.68% and 38.69% of cases, even when prompted with privacy-enhancing instructions.
arXiv Detail & Related papers (2024-08-29T17:58:38Z)
Collection, usage and privacy of mobility data in the enterprise and public administrations [55.2480439325792]
Security measures such as anonymization are needed to protect individuals' privacy. Within our study, we conducted expert interviews to gain insights into practices in the field. We survey privacy-enhancing methods in use, which generally do not comply with state-of-the-art standards of differential privacy.
arXiv Detail & Related papers (2024-07-04T08:29:27Z)
Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning [62.224804688233]
differential privacy (DP) offers a promising solution by ensuring models are 'almost indistinguishable' with or without any particular privacy unit. We study user-level DP motivated by applications where it necessary to ensure uniform privacy protection across users.
arXiv Detail & Related papers (2024-06-20T13:54:32Z)
PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind) Our work offers a theoretical analysis for model design and benchmarks various techniques. In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z)
The Saudi Privacy Policy Dataset [0.0]
The paper introduces a diverse compilation of privacy policies from various sectors in Saudi Arabia. The final dataset includes 1,000 websites belonging to 7 sectors, 4,638 lines of text, 775,370 tokens, and a corpus size of 8,353 KB. The paper aims to further research and development in the areas of privacy policy analysis, natural language processing, and machine learning applications related to privacy and data protection.
arXiv Detail & Related papers (2023-04-05T21:40:37Z)
PLUE: Language Understanding Evaluation Benchmark for Privacy Policies in English [77.79102359580702]
We introduce the Privacy Policy Language Understanding Evaluation benchmark, a multi-task benchmark for evaluating the privacy policy language understanding. We also collect a large corpus of privacy policies to enable privacy policy domain-specific language model pre-training. We demonstrate that domain-specific continual pre-training offers performance improvements across all tasks.
arXiv Detail & Related papers (2022-12-20T05:58:32Z)
Exploring Consequences of Privacy Policies with Narrative Generation via Answer Set Programming [0.0]
We present a framework that uses Answer Set Programming (ASP) to formalize privacy policies. ASP allows end-users to forward-simulate possible consequences of the policy in terms of actors. We demonstrate through the example of the Health Insurance Portability and Accountability Act how to use the system in various ways.
arXiv Detail & Related papers (2022-12-13T16:44:46Z)
Detecting Compliance of Privacy Policies with Data Protection Laws [0.0]
Privacy policies are often written in extensive legal jargon that is difficult to understand. We aim to bridge that gap by providing a framework that analyzes privacy policies in light of various data protection laws. By using such a tool, users would be better equipped to understand how their personal data is managed.
arXiv Detail & Related papers (2021-02-21T09:15:15Z)
PGLP: Customizable and Rigorous Location Privacy through Policy Graph [68.3736286350014]
We propose a new location privacy notion called PGLP, which provides a rich interface to release private locations with customizable and rigorous privacy guarantee. Specifically, we formalize a user's location privacy requirements using a textitlocation policy graph, which is expressive and customizable. Third, we design a private location trace release framework that pipelines the detection of location exposure, policy graph repair, and private trajectory release with customizable and rigorous location privacy.
arXiv Detail & Related papers (2020-05-04T04:25:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.