A Cross-Country Analysis of GDPR Cookie Banners and Flexible Methods for Scraping Them
- URL: http://arxiv.org/abs/2503.19655v1
- Date: Tue, 25 Mar 2025 13:44:26 GMT
- Title: A Cross-Country Analysis of GDPR Cookie Banners and Flexible Methods for Scraping Them
- Authors: Midas Nouwens, Janus Bager Kristensen, Kristjan Maalt, Rolf Bagge,
- Abstract summary: We examine the top 10,000 websites across 31 countries under the ePrivacy Directive and consent-observatory.eu.<n>We show that 67% of websites use consent interfaces, but only 15% are minimally compliant, mostly because they lack a reject option.<n>There is little evidence that regulators' guidance and fines have impacted compliance rates, but 18% of compliance variance is explained by CMPs.
- Score: 6.533686617147407
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Online tracking remains problematic, with compliance and ethical issues persisting despite regulatory efforts. Consent interfaces, the visible manifestation of this industry, have seen significant attention over the years. We present robust automated methods to study the presence, design, and third-party suppliers of consent interfaces at scale and the web service consent-observatory.eu to do it with. We examine the top 10,000 websites across 31 countries under the ePrivacy Directive and GDPR (n=254.148). Our findings show that 67% of websites use consent interfaces, but only 15% are minimally compliant, mostly because they lack a reject option. Consent management platforms (CMPs) are powerful intermediaries in this space: 67% of interfaces are provided by CMPs, and three organisations hold 37% of the market. There is little evidence that regulators' guidance and fines have impacted compliance rates, but 18% of compliance variance is explained by CMPs. Researchers should take an infrastructural perspective on online tracking and study the factual control of intermediaries to identify effective leverage points.
Related papers
- RegTrack: Uncovering Global Disparities in Third-party Advertising and Tracking [2.625007842420751]
Third party advertising and tracking (A&T) are pervasive across the web, yet user exposure varies significantly with browser choice, browsing location, and hosting jurisdiction.<n>Our analysis reveals that browser choice, user location, and hosting jurisdiction each shape tracking exposure in distinct ways.
arXiv Detail & Related papers (2026-03-03T07:21:15Z) - Detecting High-Potential SMEs with Heterogeneous Graph Neural Networks [0.0]
Small and Medium Enterprises (SMEs) constitute 99.9% of U.S. businesses and generate 44% of economic activity.<n>We introduce SME-HGT, a Heterogeneous Graph Transformer framework that predicts which Phase I awardees will advance to Phase II funding using exclusively public data.
arXiv Detail & Related papers (2026-02-23T08:35:55Z) - Breaking the illusion: Automated Reasoning of GDPR Consent Violations [9.488261532697814]
We present Cosmic, a novel automated framework for detecting consent-related privacy violations in web forms.<n>Cosmic detects 3384 violations on 94.1% of consent forms, covering key principles such as freely given consent purpose disclosure, and withdrawal options.
arXiv Detail & Related papers (2025-12-28T05:22:00Z) - Multi-Agent Legal Verifier Systems for Data Transfer Planning [1.286589966480548]
Legal compliance in AI-driven data transfer planning is becoming increasingly critical under stringent privacy regulations.<n>We propose a multi-agent legal verifier that decomposes compliance checking into specialized agents for statutory interpretation, business context evaluation, and risk assessment.
arXiv Detail & Related papers (2025-11-14T03:32:08Z) - Explore to Evolve: Scaling Evolved Aggregation Logic via Proactive Online Exploration for Deep Research Agents [70.77400371166922]
Deep research web agents need to rigorously analyze and aggregate knowledge for insightful research.<n>We propose an Explore to Evolve paradigm to scalably construct verifiable training data for web agents.<n>Based on an open-source agent framework, SmolAgents, we collect supervised fine-tuning trajectories to develop a series of foundation models.
arXiv Detail & Related papers (2025-10-16T08:37:42Z) - Innovative Deep Learning Architecture for Enhanced Altered Fingerprint Recognition [0.0]
We present DeepAFRNet, a deep learning recognition model that matches and recognizes distorted fingerprint samples.<n>The approach uses a VGG16 backbone to extract high-dimensional features and cosine similarity to compare embeddings.<n>With strict thresholds, DeepAFRNet achieves accuracies of 96.7 percent, 98.76 percent, and 99.54 percent for the three levels.
arXiv Detail & Related papers (2025-09-24T20:12:37Z) - WebGuard: Building a Generalizable Guardrail for Web Agents [59.31116061613742]
WebGuard is the first dataset designed to support the assessment of web agent action risks.<n>It contains 4,939 human-annotated actions from 193 websites across 22 diverse domains.
arXiv Detail & Related papers (2025-07-18T18:06:27Z) - Person Recognition at Altitude and Range: Fusion of Face, Body Shape and Gait [70.00430652562012]
FarSight is an end-to-end system for person recognition that integrates biometric cues across face, gait, and body shape modalities.<n>FarSight incorporates novel algorithms across four core modules: multi-subject detection and tracking, recognition-aware video restoration, modality-specific biometric feature encoding, and quality-guided multi-modal fusion.
arXiv Detail & Related papers (2025-05-07T17:58:25Z) - The BrowserGym Ecosystem for Web Agent Research [151.90034093362343]
BrowserGym ecosystem addresses the growing need for efficient evaluation and benchmarking of web agents.
We propose an extended BrowserGym-based ecosystem for web agent research, which unifies existing benchmarks from the literature.
We conduct the first large-scale, multi-benchmark web agent experiment and compare the performance of 6 state-of-the-art LLMs across 6 popular web agent benchmarks.
arXiv Detail & Related papers (2024-12-06T23:43:59Z) - Consent in Crisis: The Rapid Decline of the AI Data Commons [74.68176012363253]
General-purpose artificial intelligence (AI) systems are built on massive swathes of public web data.
We conduct the first, large-scale, longitudinal audit of the consent protocols for the web domains underlying AI training corpora.
arXiv Detail & Related papers (2024-07-20T16:50:18Z) - Certifiably Byzantine-Robust Federated Conformal Prediction [49.23374238798428]
We introduce a novel framework Rob-FCP, which executes robust federated conformal prediction effectively countering malicious clients.
We empirically demonstrate the robustness of Rob-FCP against diverse proportions of malicious clients under a variety of Byzantine attacks.
arXiv Detail & Related papers (2024-06-04T04:43:30Z) - An Empirical Study on Compliance with Ranking Transparency in the
Software Documentation of EU Online Platforms [7.461555266672227]
This study empirically evaluate the compliance of six major platforms (Amazon, Bing, Booking, Google, Tripadvisor, and Yahoo)
We introduce and test automated compliance assessment tools based on ChatGPT and information retrieval technology.
Our findings could help enhance regulatory compliance and align with the United Nations Sustainable Development Goal 10.3.
arXiv Detail & Related papers (2023-12-22T16:08:32Z) - The Devil is in the Points: Weakly Semi-Supervised Instance Segmentation
via Point-Guided Mask Representation [61.027468209465354]
We introduce a novel learning scheme named weakly semi-supervised instance segmentation (WSSIS) with point labels.
We propose a method for WSSIS that can effectively leverage the budget-friendly point labels as a powerful weak supervision source.
We conduct extensive experiments on COCO and BDD100K datasets, and the proposed method achieves promising results comparable to those of the fully-supervised model.
arXiv Detail & Related papers (2023-03-27T10:11:22Z) - Self-supervised Graph Representation Learning for Black Market Account
Detection [62.03978210281426]
Black market accounts (BMAs) are not directly involved in frauds and are more difficult to detect.
This paper illustrates our BMA detection system SGRL (Self-supervised Graph Learning) used in WeChat, a representative MMMA with over a billion users.
We deploy SGRL in the online environment to detect BMAs on the billion-scale WeChat graph, and it exceeds the alternative by 7.27% on the online evaluation measure.
arXiv Detail & Related papers (2022-12-06T00:42:00Z) - NLP-based Automated Compliance Checking of Data Processing Agreements
against GDPR [9.022562906627991]
We propose an automated solution to check compliance of a given DPA against the "shall" requirements.
Our approach correctly finds 618 out of 750 genuine violations while raising 76 false violations, and further correctly identifies 524 satisfied requirements.
arXiv Detail & Related papers (2022-09-20T13:50:58Z) - SoftDropConnect (SDC) -- Effective and Efficient Quantification of the
Network Uncertainty in Deep MR Image Analysis [6.556578665564248]
We propose a novel yet simple Bayesian inference approach called SoftDropConnect (SDC) to quantify the network uncertainty in medical imaging tasks.
Our proposed method generates results withsubstantially improved prediction accuracy (by 10.0%, 5.4% and 3.7% respectively for segmentation in terms of dice score) and greatly reduced uncertainty in terms of mutual information.
arXiv Detail & Related papers (2022-01-20T19:22:26Z) - Adherence Forecasting for Guided Internet-Delivered Cognitive Behavioral
Therapy: A Minimally Data-Sensitive Approach [59.535699822923]
Internet-delivered psychological treatments (IDPT) are seen as an effective and scalable pathway to improving the accessibility of mental healthcare.
This work proposes a deep-learning approach to perform automatic adherence forecasting, while relying on minimally sensitive login/logout data.
The proposed Self-Attention Network achieved over 70% average balanced accuracy, when only 1/3 of the treatment duration had elapsed.
arXiv Detail & Related papers (2022-01-11T13:55:57Z) - Consent Management Platforms under the GDPR: processors and/or
controllers? [11.514573594428352]
Consent Management Providers (CMPs) provide consent pop-ups embedded in more websites.
CMPs enable compliance with legal requirements for consent mandated by the General Data Protection Regulation (ePrivacy Directive)
Although IAB's TCF specifications characterize CMPs as data processors CMPs factual activities often qualifies them as data controllers instead.
arXiv Detail & Related papers (2021-04-14T13:54:02Z) - Data Protection Impact Assessment for the Corona App [0.0]
SARS-CoV-2 started spreading in Europe in early 2020 and there has been a strong call for technical solutions to combat or contain the pandemic.
There has been a strong call for technical solutions with contact tracing apps at the heart of debates.
The EU's General Daten Protection Regulation (DPIA) requires controllers to carry out a data protection assessment.
We present a scientific DPIA which thoroughly examines three published contact tracing app designs that are considered to be the most "privacy-friendly"
arXiv Detail & Related papers (2021-01-18T19:23:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.