Related papers: Privacy-Aware Crowd Labelling for Machine Learning Tasks

Privacy-Aware Crowd Labelling for Machine Learning Tasks

URL: http://arxiv.org/abs/2203.01373v1
Date: Thu, 3 Feb 2022 18:14:45 GMT
Title: Privacy-Aware Crowd Labelling for Machine Learning Tasks
Authors: Giannis Haralabopoulos and Ioannis Anagnostopoulos
Abstract summary: We propose a privacy preserving text labelling method for varying applications, based in crowdsourcing. We transform text with different levels of privacy, and analyse the effectiveness of the transformation with regards to label correlation and consistency.
Score: 3.6930948691311007
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The extensive use of online social media has highlighted the importance of privacy in the digital space. As more scientists analyse the data created in these platforms, privacy concerns have extended to data usage within the academia. Although text analysis is a well documented topic in academic literature with a multitude of applications, ensuring privacy of user-generated content has been overlooked. Most sentiment analysis methods require emotion labels, which can be obtained through crowdsourcing, where non-expert individuals contribute to scientific tasks. The text itself has to be exposed to third parties in order to be labelled. In an effort to reduce the exposure of online users' information, we propose a privacy preserving text labelling method for varying applications, based in crowdsourcing. We transform text with different levels of privacy, and analyse the effectiveness of the transformation with regards to label correlation and consistency. Our results suggest that privacy can be implemented in labelling, retaining the annotational diversity and subjectivity of traditional labelling.

Related papers

Investigating User Perspectives on Differentially Private Text Privatization [81.59631769859004]
This work investigates how factors of $textitscenario$, $textitdata sensitivity$, $textitmechanism type$, and $textitreason for data collection$ impact user preferences for text privatization. We learn that while all these factors play a role in influencing privacy decisions, users are highly sensitive to the utility and coherence of the private output texts.
arXiv Detail & Related papers (2025-03-12T12:33:20Z)
Fingerprinting and Tracing Shadows: The Development and Impact of Browser Fingerprinting on Digital Privacy [55.2480439325792]
Browser fingerprinting is a growing technique for identifying and tracking users online without traditional methods like cookies. This paper gives an overview by examining the various fingerprinting techniques and analyzes the entropy and uniqueness of the collected data.
arXiv Detail & Related papers (2024-11-18T20:32:31Z)
Identifying Privacy Personas [27.301741710016223]
Privacy personas capture the differences in user segments with respect to one's knowledge, behavioural patterns, level of self-efficacy, and perception of the importance of privacy protection. While various privacy personas have been derived in the literature, they group together people who differ from each other in terms of important attributes. We propose eight personas that we derive by combining qualitative and quantitative analysis of the responses to an interactive educational questionnaire.
arXiv Detail & Related papers (2024-10-17T20:49:46Z)
Collection, usage and privacy of mobility data in the enterprise and public administrations [55.2480439325792]
Security measures such as anonymization are needed to protect individuals' privacy. Within our study, we conducted expert interviews to gain insights into practices in the field. We survey privacy-enhancing methods in use, which generally do not comply with state-of-the-art standards of differential privacy.
arXiv Detail & Related papers (2024-07-04T08:29:27Z)
NAP^2: A Benchmark for Naturalness and Privacy-Preserving Text Rewriting by Learning from Human [55.20137833039499]
We suggest sanitizing sensitive text using two common strategies used by humans. We curate the first corpus, coined NAP2, through both crowdsourcing and the use of large language models.
arXiv Detail & Related papers (2024-06-06T05:07:44Z)
Embedding Privacy in Computational Social Science and Artificial Intelligence Research [2.048226951354646]
Preserving privacy has emerged as a critical factor in research. The increasing use of advanced computational models stands to exacerbate privacy concerns. This article contributes to the field by discussing the role of privacy and the issues that researchers working in CSS, AI, data science and related domains are likely to face.
arXiv Detail & Related papers (2024-04-17T16:07:53Z)
Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory [82.7042006247124]
We show that even the most capable AI models reveal private information in contexts that humans would not, 39% and 57% of the time, respectively. Our work underscores the immediate need to explore novel inference-time privacy-preserving approaches, based on reasoning and theory of mind.
arXiv Detail & Related papers (2023-10-27T04:15:30Z)
Neural Text Sanitization with Privacy Risk Indicators: An Empirical Analysis [2.9311414545087366]
We consider a two-step approach to text sanitization and provide a detailed analysis of its empirical performance. The text sanitization process starts with a privacy-oriented entity recognizer that seeks to determine the text spans expressing identifiable personal information. We present five distinct indicators of the re-identification risk, respectively based on language model probabilities, text span classification, sequence labelling, perturbations, and web search.
arXiv Detail & Related papers (2023-10-22T14:17:27Z)
Crowdsourcing on Sensitive Data with Privacy-Preserving Text Rewriting [9.409281517596396]
Data labeling is often done on crowdsourcing platforms due to scalability reasons. publishing data on public platforms can only be done if no privacy-relevant information is included. We investigate how removing personally identifiable information (PII) as well as applying differential privacy (DP) rewriting can enable text with privacy-relevant information to be used for crowdsourcing.
arXiv Detail & Related papers (2023-03-06T11:54:58Z)
How Do Input Attributes Impact the Privacy Loss in Differential Privacy? [55.492422758737575]
We study the connection between the per-subject norm in DP neural networks and individual privacy loss. We introduce a novel metric termed the Privacy Loss-Input Susceptibility (PLIS) which allows one to apportion the subject's privacy loss to their input attributes.
arXiv Detail & Related papers (2022-11-18T11:39:03Z)
A Multi-input Multi-output Transformer-based Hybrid Neural Network for Multi-class Privacy Disclosure Detection [3.04585143845864]
In this paper, we propose a multi-input, multi-output hybrid neural network which utilizes transfer-learning, linguistics, and metadata to learn the hidden patterns. We trained and evaluated our model on a human-annotated ground truth dataset, containing a total of 5,400 tweets.
arXiv Detail & Related papers (2021-08-19T03:58:49Z)
Learning Language and Multimodal Privacy-Preserving Markers of Mood from Mobile Data [74.60507696087966]
Mental health conditions remain underdiagnosed even in countries with common access to advanced medical care. One promising data source to help monitor human behavior is daily smartphone usage. We study behavioral markers of daily mood using a recent dataset of mobile behaviors from adolescent populations at high risk of suicidal behaviors.
arXiv Detail & Related papers (2021-06-24T17:46:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.