Related papers: ROI: A method for identifying organizations receiving personal data

ROI: A method for identifying organizations receiving personal data

URL: http://arxiv.org/abs/2204.09495v2
Date: Tue, 25 Jul 2023 07:11:39 GMT
Title: ROI: A method for identifying organizations receiving personal data
Authors: David Rodriguez, Jose M. Del Alamo, Miguel Cozar and Boni Garcia
Abstract summary: This paper assesses techniques available in the state of the art to identify the organizations receiving this personal data. We propose a fully automated method that combines different techniques to achieve a 95.71% precision score. We demonstrate our method in the wild by evaluating 10,000 Android apps and exposing the organizations that receive users' personal data.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Many studies have exposed the massive collection of personal data in the digital ecosystem through, for instance, websites, mobile apps, or smart devices. This fact goes unnoticed by most users, who are also unaware that the collectors are sharing their personal data with many different organizations around the globe. This paper assesses techniques available in the state of the art to identify the organizations receiving this personal data. Based on our findings, we propose ROI (Receiver Organization Identifier), a fully automated method that combines different techniques to achieve a 95.71% precision score in identifying an organization receiving personal data. We demonstrate our method in the wild by evaluating 10,000 Android apps and exposing the organizations that receive users' personal data.

Related papers

PersonaBench: Evaluating AI Models on Understanding Personal Information through Accessing (Synthetic) Private User Data [76.21047984886273]
Personalization is critical in AI assistants, particularly in the context of private AI models that work with individual users. Due to the sensitive nature of such data, there are no publicly available datasets that allow us to assess an AI model's ability to understand users. We introduce a synthetic data generation pipeline that creates diverse, realistic user profiles and private documents simulating human activities.
arXiv Detail & Related papers (2025-02-28T00:43:35Z)
Protecting User Privacy in Online Settings via Supervised Learning [69.38374877559423]
We design an intelligent approach to online privacy protection that leverages supervised learning. By detecting and blocking data collection that might infringe on a user's privacy, we can restore a degree of digital privacy to the user.
arXiv Detail & Related papers (2023-04-06T05:20:16Z)
Participatory Personalization in Classification [8.234011679612436]
We introduce a family of classification models, called participatory systems, that let individuals opt into personalization at prediction time. We conduct a comprehensive empirical study of participatory systems in clinical prediction tasks, benchmarking them with common approaches for personalization and imputation. Our results demonstrate that participatory systems can facilitate and inform consent while improving performance and data use across all groups who report personal data.
arXiv Detail & Related papers (2023-02-08T04:24:19Z)
Unsupervised Text Deidentification [101.2219634341714]
We propose an unsupervised deidentification method that masks words that leak personally-identifying information. Motivated by K-anonymity based privacy, we generate redactions that ensure a minimum reidentification rank.
arXiv Detail & Related papers (2022-10-20T18:54:39Z)
Sotto Voce: Federated Speech Recognition with Differential Privacy Guarantees [0.761963751158349]
Speech data is expensive to collect, and incredibly sensitive to its sources. It is often the case that organizations independently collect small datasets for their own use, but often these are not performant for the demands of machine learning. Organizations could pool these datasets together and jointly build a strong ASR system; sharing data in the clear, however, comes with tremendous risk, in terms of intellectual property loss as well as loss of privacy of the individuals who exist in the dataset.
arXiv Detail & Related papers (2022-07-16T02:48:54Z)
RealGait: Gait Recognition for Person Re-Identification [79.67088297584762]
We construct a new gait dataset by extracting silhouettes from an existing video person re-identification challenge which consists of 1,404 persons walking in an unconstrained manner. Our results suggest that recognizing people by their gait in real surveillance scenarios is feasible and the underlying gait pattern is probably the true reason why video person re-idenfification works in practice.
arXiv Detail & Related papers (2022-01-13T06:30:56Z)
Unsupervised Domain Adaptive Learning via Synthetic Data for Person Re-identification [101.1886788396803]
Person re-identification (re-ID) has gained more and more attention due to its widespread applications in video surveillance. Unfortunately, the mainstream deep learning methods still need a large quantity of labeled data to train models. In this paper, we develop a data collector to automatically generate synthetic re-ID samples in a computer game, and construct a data labeler to simultaneously annotate them.
arXiv Detail & Related papers (2021-09-12T15:51:41Z)
Private data sharing between decentralized users through the privGAN architecture [1.3923892290096642]
We propose a method for data owners to share synthetic or fake versions of their data without sharing the actual data. We demonstrate that this approach, when applied to subsets of various sizes, leads to better utility for the owners than the utility from their real datasets.
arXiv Detail & Related papers (2020-09-14T22:06:13Z)
Detecting Informal Organization Through Data Mining Techniques [0.0]
This study classifies indices of human resources influencing the creation of informal organizations. Applied data mining techniques in this study are factor analysis, clustering by K-means, classification by decision trees, and finally association rule mining by GRI algorithm.
arXiv Detail & Related papers (2020-09-07T05:42:37Z)
Bias in Multimodal AI: Testbed for Fair Automatic Recruitment [73.85525896663371]
We study how current multimodal algorithms based on heterogeneous sources of information are affected by sensitive elements and inner biases in the data. We train automatic recruitment algorithms using a set of multimodal synthetic profiles consciously scored with gender and racial biases. Our methodology and results show how to generate fairer AI-based tools in general, and in particular fairer automated recruitment systems.
arXiv Detail & Related papers (2020-04-15T15:58:05Z)
Federating Recommendations Using Differentially Private Prototypes [16.29544153550663]
We propose a new federated approach to learning global and local private models for recommendation without collecting raw data. By requiring only two rounds of communication, we both reduce the communication costs and avoid the excessive privacy loss. We show local adaptation of the global model allows our method to outperform centralized matrix-factorization-based recommender system models.
arXiv Detail & Related papers (2020-03-01T22:21:31Z)
Investigating the Impact of Inclusion in Face Recognition Training Data on Individual Face Identification [93.5538147928669]
We audit ArcFace, a state-of-the-art, open source face recognition system, in a large-scale face identification experiment with more than one million distractor images. We find a Rank-1 face identification accuracy of 79.71% for individuals present in the model's training data and an accuracy of 75.73% for those not present.
arXiv Detail & Related papers (2020-01-09T15:50:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.