Privacy-Aware Crowd Labelling for Machine Learning Tasks
- URL: http://arxiv.org/abs/2203.01373v1
- Date: Thu, 3 Feb 2022 18:14:45 GMT
- Title: Privacy-Aware Crowd Labelling for Machine Learning Tasks
- Authors: Giannis Haralabopoulos and Ioannis Anagnostopoulos
- Abstract summary: We propose a privacy preserving text labelling method for varying applications, based in crowdsourcing.
We transform text with different levels of privacy, and analyse the effectiveness of the transformation with regards to label correlation and consistency.
- Score: 3.6930948691311007
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The extensive use of online social media has highlighted the importance of
privacy in the digital space. As more scientists analyse the data created in
these platforms, privacy concerns have extended to data usage within the
academia. Although text analysis is a well documented topic in academic
literature with a multitude of applications, ensuring privacy of user-generated
content has been overlooked. Most sentiment analysis methods require emotion
labels, which can be obtained through crowdsourcing, where non-expert
individuals contribute to scientific tasks. The text itself has to be exposed
to third parties in order to be labelled. In an effort to reduce the exposure
of online users' information, we propose a privacy preserving text labelling
method for varying applications, based in crowdsourcing. We transform text with
different levels of privacy, and analyse the effectiveness of the
transformation with regards to label correlation and consistency. Our results
suggest that privacy can be implemented in labelling, retaining the
annotational diversity and subjectivity of traditional labelling.
Related papers
- Fingerprinting and Tracing Shadows: The Development and Impact of Browser Fingerprinting on Digital Privacy [55.2480439325792]
Browser fingerprinting is a growing technique for identifying and tracking users online without traditional methods like cookies.
This paper gives an overview by examining the various fingerprinting techniques and analyzes the entropy and uniqueness of the collected data.
arXiv Detail & Related papers (2024-11-18T20:32:31Z) - Identifying Privacy Personas [27.301741710016223]
Privacy personas capture the differences in user segments with respect to one's knowledge, behavioural patterns, level of self-efficacy, and perception of the importance of privacy protection.
While various privacy personas have been derived in the literature, they group together people who differ from each other in terms of important attributes.
We propose eight personas that we derive by combining qualitative and quantitative analysis of the responses to an interactive educational questionnaire.
arXiv Detail & Related papers (2024-10-17T20:49:46Z) - Collection, usage and privacy of mobility data in the enterprise and public administrations [55.2480439325792]
Security measures such as anonymization are needed to protect individuals' privacy.
Within our study, we conducted expert interviews to gain insights into practices in the field.
We survey privacy-enhancing methods in use, which generally do not comply with state-of-the-art standards of differential privacy.
arXiv Detail & Related papers (2024-07-04T08:29:27Z) - NAP^2: A Benchmark for Naturalness and Privacy-Preserving Text Rewriting by Learning from Human [55.20137833039499]
We suggest sanitizing sensitive text using two common strategies used by humans.
We curate the first corpus, coined NAP2, through both crowdsourcing and the use of large language models.
arXiv Detail & Related papers (2024-06-06T05:07:44Z) - Embedding Privacy in Computational Social Science and Artificial Intelligence Research [2.048226951354646]
Preserving privacy has emerged as a critical factor in research.
The increasing use of advanced computational models stands to exacerbate privacy concerns.
This article contributes to the field by discussing the role of privacy and the issues that researchers working in CSS, AI, data science and related domains are likely to face.
arXiv Detail & Related papers (2024-04-17T16:07:53Z) - Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory [82.7042006247124]
We show that even the most capable AI models reveal private information in contexts that humans would not, 39% and 57% of the time, respectively.
Our work underscores the immediate need to explore novel inference-time privacy-preserving approaches, based on reasoning and theory of mind.
arXiv Detail & Related papers (2023-10-27T04:15:30Z) - Neural Text Sanitization with Privacy Risk Indicators: An Empirical
Analysis [2.9311414545087366]
We consider a two-step approach to text sanitization and provide a detailed analysis of its empirical performance.
The text sanitization process starts with a privacy-oriented entity recognizer that seeks to determine the text spans expressing identifiable personal information.
We present five distinct indicators of the re-identification risk, respectively based on language model probabilities, text span classification, sequence labelling, perturbations, and web search.
arXiv Detail & Related papers (2023-10-22T14:17:27Z) - Crowdsourcing on Sensitive Data with Privacy-Preserving Text Rewriting [9.409281517596396]
Data labeling is often done on crowdsourcing platforms due to scalability reasons.
publishing data on public platforms can only be done if no privacy-relevant information is included.
We investigate how removing personally identifiable information (PII) as well as applying differential privacy (DP) rewriting can enable text with privacy-relevant information to be used for crowdsourcing.
arXiv Detail & Related papers (2023-03-06T11:54:58Z) - How Do Input Attributes Impact the Privacy Loss in Differential Privacy? [55.492422758737575]
We study the connection between the per-subject norm in DP neural networks and individual privacy loss.
We introduce a novel metric termed the Privacy Loss-Input Susceptibility (PLIS) which allows one to apportion the subject's privacy loss to their input attributes.
arXiv Detail & Related papers (2022-11-18T11:39:03Z) - A Multi-input Multi-output Transformer-based Hybrid Neural Network for
Multi-class Privacy Disclosure Detection [3.04585143845864]
In this paper, we propose a multi-input, multi-output hybrid neural network which utilizes transfer-learning, linguistics, and metadata to learn the hidden patterns.
We trained and evaluated our model on a human-annotated ground truth dataset, containing a total of 5,400 tweets.
arXiv Detail & Related papers (2021-08-19T03:58:49Z) - Learning Language and Multimodal Privacy-Preserving Markers of Mood from
Mobile Data [74.60507696087966]
Mental health conditions remain underdiagnosed even in countries with common access to advanced medical care.
One promising data source to help monitor human behavior is daily smartphone usage.
We study behavioral markers of daily mood using a recent dataset of mobile behaviors from adolescent populations at high risk of suicidal behaviors.
arXiv Detail & Related papers (2021-06-24T17:46:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.