Related papers: Setting the Course, but Forgetting to Steer: Analyzing Compliance with GDPR's Right of Access to Data by Instagram, TikTok, and YouTube

Setting the Course, but Forgetting to Steer: Analyzing Compliance with GDPR's Right of Access to Data by Instagram, TikTok, and YouTube

URL: http://arxiv.org/abs/2502.11208v2
Date: Sun, 19 Oct 2025 21:03:17 GMT
Title: Setting the Course, but Forgetting to Steer: Analyzing Compliance with GDPR's Right of Access to Data by Instagram, TikTok, and YouTube
Authors: Sai Keerthana Karnam, Abhisek Dash, Antariksh Das, Sepehr Mousavi, Stefan Bechtold, Krishna P. Gummadi, Animesh Mukherjee, Ingmar Weber, Savvas Zannettou,
Abstract summary: The Right of Access aims to empower users with control over their personal data via Data Download Packages (DDPs)<n>This paper conducts a comprehensive audit of DDPs from three social media platforms (TikTok, Instagram, and YouTube) to systematically assess these critical drawbacks.
Score: 9.304421724270828
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The GDPR's Right of Access aims to empower users with control over their personal data via Data Download Packages (DDPs). However, their effectiveness is often compromised by inconsistent platform implementations, questionable data reliability, and poor user comprehensibility. This paper conducts a comprehensive audit of DDPs from three social media platforms (TikTok, Instagram, and YouTube) to systematically assess these critical drawbacks. Despite offering similar services, we find that these platforms demonstrate significant inconsistencies in implementing the Right of Access, evident in varying levels of shared data. Critically, the failure to disclose processing purposes, retention periods, and other third-party data recipients serves as a further indicator of non-compliance. Our reliability evaluations, using bots and user-donated data, reveal that while TikTok's DDPs offer more consistent and complete data, others exhibit notable shortcomings. Similarly, our assessment of comprehensibility, based on surveys with 400 participants, indicates that current DDPs substantially fall short of GDPR's standards. To improve the comprehensibility, we propose and demonstrate a two-layered approach by: (1)~enhancing the data representation itself using stakeholder interpretations; and (2)~incorporating a user-friendly extension (\textit{Know Your Data}) for intuitive data visualization where users can control the level of transparency they prefer. Our findings underscore the need for clearer and non-conflicting regulatory guidance, stricter enforcement, and platform commitment to realize the goal of GDPR's Right of Access.

Related papers

Towards Verifiable Federated Unlearning: Framework, Challenges, and The Road Ahead [6.530323505784683]
Federated unlearning (FUL) enables removing the data influence from the model trained across distributed clients.<n>This article introduces veriFUL, a reference framework for verifiable FUL that formalizes verification entities, goals, approaches, and metrics.
arXiv Detail & Related papers (2025-10-01T12:45:46Z)
Improving Regulatory Oversight in Online Content Moderation [2.1082552608122542]
The European Union introduced the Digital Services Act (DSA) to address the risks associated with digital platforms and promote a safer online environment.<n>Despite the potential of components such as the Transparency Database, Transparency Reports, and Article 40 of the DSA to improve platform transparency, significant challenges remain.<n>These include data inconsistencies and a lack of detailed information, which hinder transparency in content moderation practices.
arXiv Detail & Related papers (2025-06-04T16:38:25Z)
Divide-Then-Align: Honest Alignment based on the Knowledge Boundary of RAG [51.120170062795566]
We propose Divide-Then-Align (DTA) to endow RAG systems with the ability to respond with "I don't know" when the query is out of the knowledge boundary.<n>DTA balances accuracy with appropriate abstention, enhancing the reliability and trustworthiness of retrieval-augmented systems.
arXiv Detail & Related papers (2025-05-27T08:21:21Z)
P2NIA: Privacy-Preserving Non-Iterative Auditing [5.619344845505019]
The emergence of AI legislation has increased the need to assess the ethical compliance of high-risk AI systems.<n>Traditional auditing methods rely on platforms' application programming interfaces (APIs)<n>We present P2NIA, a novel auditing scheme that proposes a mutually beneficial collaboration for both the auditor and the platform.
arXiv Detail & Related papers (2025-04-01T15:04:58Z)
IPAD: Inverse Prompt for AI Detection -- A Robust and Explainable LLM-Generated Text Detector [11.112793289424886]
Large Language Models (LLMs) have attained human-level fluency in text generation, which complicates the distinction between human-written and LLM-generated texts.<n>Existing detectors exhibit poor robustness on out-of-distribution (OOD) data and attacked data, which is critical for real-world scenarios.<n>We propose IPAD (Inverse Prompt for AI Detection), a novel framework consisting of a Prompt that identifies predicted prompts that could have generated the input text, and two Distinguishers that examine the probability that the input texts align with the predicted prompts.
arXiv Detail & Related papers (2025-02-21T19:41:32Z)
Access Denied: Meaningful Data Access for Quantitative Algorithm Audits [4.182284365432724]
Third-party audits are often hindered by access restrictions, forcing auditors to rely on limited, low-quality data.<n>We conduct audit simulations on two realistic case studies for recidivism and healthcare coverage prediction.<n>We find that data minimization and anonymization practices can strongly increase error rates on individual-level data, leading to unreliable assessments.
arXiv Detail & Related papers (2025-02-01T13:33:45Z)
Adaptive PII Mitigation Framework for Large Language Models [2.694044579874688]
This paper introduces an adaptive system for mitigating risk of Personally Identifiable Information (PII) and Sensitive Personal Information (SPI)<n>The system uses advanced NLP techniques, context-aware analysis, and policy-driven masking to ensure regulatory compliance.<n> Benchmarks highlight the system's effectiveness, with an F1 score of 0.95 for Passport Numbers.
arXiv Detail & Related papers (2025-01-21T19:22:45Z)
Are Data Experts Buying into Differentially Private Synthetic Data? Gathering Community Perspectives [14.736115103446101]
In the United States, differential privacy (DP) is the dominant technical operationalization of privacy-preserving data analysis.<n>This study qualitatively examines one class of DP mechanisms: private data synthesizers.
arXiv Detail & Related papers (2024-12-17T15:50:14Z)
PASTA-4-PHT: A Pipeline for Automated Security and Technical Audits for the Personal Health Train [34.203290179252555]
This work discusses a PHT-aligned security and audit pipeline inspired by DevSecOps principles.<n>We introduce vulnerabilities into a PHT and apply our pipeline to five real-world PHTs, which have been utilised in real-world studies.<n>Ultimately, our work contributes to an increased security and overall transparency of data processing activities within the PHT framework.
arXiv Detail & Related papers (2024-12-02T08:43:40Z)
Lazy Data Practices Harm Fairness Research [49.02318458244464]
We present a comprehensive analysis of fair ML datasets, demonstrating how unreflective practices hinder the reach and reliability of algorithmic fairness findings. Our analyses identify three main areas of concern: (1) a textbflack of representation for certain protected attributes in both data and evaluations; (2) the widespread textbf of minorities during data preprocessing; and (3) textbfopaque data processing threatening the generalization of fairness research. This study underscores the need for a critical reevaluation of data practices in fair ML and offers directions to improve both the sourcing and usage of datasets.
arXiv Detail & Related papers (2024-04-26T09:51:24Z)
An Empirical Study on Compliance with Ranking Transparency in the Software Documentation of EU Online Platforms [7.461555266672227]
This study empirically evaluate the compliance of six major platforms (Amazon, Bing, Booking, Google, Tripadvisor, and Yahoo) We introduce and test automated compliance assessment tools based on ChatGPT and information retrieval technology. Our findings could help enhance regulatory compliance and align with the United Nations Sustainable Development Goal 10.3.
arXiv Detail & Related papers (2023-12-22T16:08:32Z)
Data Acquisition: A New Frontier in Data-centric AI [65.90972015426274]
We first present an investigation of current data marketplaces, revealing lack of platforms offering detailed information about datasets. We then introduce the DAM challenge, a benchmark to model the interaction between the data providers and acquirers. Our evaluation of the submitted strategies underlines the need for effective data acquisition strategies in Machine Learning.
arXiv Detail & Related papers (2023-11-22T22:15:17Z)
FairDP: Certified Fairness with Differential Privacy [55.51579601325759]
This paper introduces FairDP, a novel training mechanism designed to provide group fairness certification for the trained model's decisions.<n>The key idea of FairDP is to train models for distinct individual groups independently, add noise to each group's gradient for data privacy protection, and integrate knowledge from group models to formulate a model that balances privacy, utility, and fairness in downstream tasks.
arXiv Detail & Related papers (2023-05-25T21:07:20Z)
D-BIAS: A Causality-Based Human-in-the-Loop System for Tackling Algorithmic Bias [57.87117733071416]
We propose D-BIAS, a visual interactive tool that embodies human-in-the-loop AI approach for auditing and mitigating social biases. A user can detect the presence of bias against a group by identifying unfair causal relationships in the causal network. For each interaction, say weakening/deleting a biased causal edge, the system uses a novel method to simulate a new (debiased) dataset.
arXiv Detail & Related papers (2022-08-10T03:41:48Z)
DataPerf: Benchmarks for Data-Centric AI Development [81.03754002516862]
DataPerf is a community-led benchmark suite for evaluating ML datasets and data-centric algorithms. We provide an open, online platform with multiple rounds of challenges to support this iterative development. The benchmarks, online evaluation platform, and baseline implementations are open source.
arXiv Detail & Related papers (2022-07-20T17:47:54Z)
How Do Socio-Demographic Patterns Define Digital Privacy Divide? [0.5571177307684636]
Digital privacy has become an essential component of information and communications technology (ICT) systems. There is still a gap in the digital privacy protection levels available for users. This paper studies the digital privacy divide (DPD) problem in ICT systems.
arXiv Detail & Related papers (2022-01-20T00:59:53Z)
What Stops Learning-based 3D Registration from Working in the Real World? [53.68326201131434]
This work identifies the sources of 3D point cloud registration failures, analyze the reasons behind them, and propose solutions. Ultimately, this translates to a best-practice 3D registration network (BPNet), constituting the first learning-based method able to handle previously-unseen objects in real-world data. Our model generalizes to real data without any fine-tuning, reaching an accuracy of up to 67% on point clouds of unseen objects obtained with a commercial sensor.
arXiv Detail & Related papers (2021-11-19T19:24:27Z)
Trustworthy Transparency by Design [57.67333075002697]
We propose a transparency framework for software design, incorporating research on user trust and experience. Our framework enables developing software that incorporates transparency in its design.
arXiv Detail & Related papers (2021-03-19T12:34:01Z)
Explainable Patterns: Going from Findings to Insights to Support Data Analytics Democratization [60.18814584837969]
We present Explainable Patterns (ExPatt), a new framework to support lay users in exploring and creating data storytellings. ExPatt automatically generates plausible explanations for observed or selected findings using an external (textual) source of information.
arXiv Detail & Related papers (2021-01-19T16:13:44Z)
Trustworthy AI [75.99046162669997]
Brittleness to minor adversarial changes in the input data, ability to explain the decisions, address the bias in their training data, are some of the most prominent limitations. We propose the tutorial on Trustworthy AI to address six critical issues in enhancing user and public trust in AI systems.
arXiv Detail & Related papers (2020-11-02T20:04:18Z)
GDPR: When the Right to Access Personal Data Becomes a Threat [63.732639864601914]
We examine more than 300 data controllers performing for each of them a request to access personal data. We find that 50.4% of the data controllers that handled the request, have flaws in the procedure of identifying the users. With the undesired and surprising result that, in its present deployment, has actually decreased the privacy of the users of web services.
arXiv Detail & Related papers (2020-05-04T22:01:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.