Related papers: Differential Tracking Across Topical Webpages of Indian News Media

Differential Tracking Across Topical Webpages of Indian News Media

URL: http://arxiv.org/abs/2103.04442v1
Date: Sun, 7 Mar 2021 20:20:47 GMT
Title: Differential Tracking Across Topical Webpages of Indian News Media
Authors: Yash Vekaria, Vibhor Agarwal, Pushkal Agarwal, Sangeeta Mahapatra, Sakthi Balan Muthiah, Nishanth Sastry, Nicolas Kourtellis
Abstract summary: We propose a novel method for automatic extraction and categorization of Indian news topical subpages based on the details in their URLs. We find differential user tracking among subpages, and between subpages and homepages. embedded third-parties tend to track specific subpages simultaneously, revealing possible user profiling in action.
Score: 3.721918008485747
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Online user privacy and tracking have been extensively studied in recent years, especially due to privacy and personal data-related legislations in the EU and the USA, such as the General Data Protection Regulation, ePrivacy Regulation, and California Consumer Privacy Act. Research has revealed novel tracking and personal identifiable information leakage methods that first- and third-parties employ on websites around the world, as well as the intensity of tracking performed on such websites. However, for the sake of scaling to cover a large portion of the Web, most past studies focused on homepages of websites, and did not look deeper into the tracking practices on their topical subpages. The majority of studies focused on the Global North markets such as the EU and the USA. Large markets such as India, which covers 20% of the world population and has no explicit privacy laws, have not been studied in this regard. We aim to address these gaps and focus on the following research questions: Is tracking on topical subpages of Indian news websites different from their homepage? Do third-party trackers prefer to track specific topics? How does this preference compare to the similarity of content shown on these topical subpages? To answer these questions, we propose a novel method for automatic extraction and categorization of Indian news topical subpages based on the details in their URLs. We study the identified topical subpages and compare them with their homepages with respect to the intensity of cookie injection and third-party embeddedness and type. We find differential user tracking among subpages, and between subpages and homepages. We also find a preferential attachment of third-party trackers to specific topics. Also, embedded third-parties tend to track specific subpages simultaneously, revealing possible user profiling in action.

Related papers

RegTrack: Uncovering Global Disparities in Third-party Advertising and Tracking [2.625007842420751]
Third party advertising and tracking (A&T) are pervasive across the web, yet user exposure varies significantly with browser choice, browsing location, and hosting jurisdiction.<n>Our analysis reveals that browser choice, user location, and hosting jurisdiction each shape tracking exposure in distinct ways.
arXiv Detail & Related papers (2026-03-03T07:21:15Z)
Exposed: Shedding Blacklight on Online Privacy [0.0]
We combine passively observed, anonymized browsing data of a large, representative sample of Americans with domain-level data on tracking from Blacklight.<n>We find that nearly all users encounter at least one ad tracker or third-party cookie over the observation window.<n>Linking trackers to their parent organizations reveals that a single organization, usually Google, can track over $50%$ of web activity of more than half the users.
arXiv Detail & Related papers (2025-12-30T07:31:48Z)
Every Keystroke You Make: A Tech-Law Measurement and Analysis of Event Listeners for Wiretapping [15.823783000812158]
Despite the growing body of research documenting widespread lack of compliance with new privacy laws, there is a lack of robust enforcement.<n>We focus on a particularly invasive tracking technique: the use of JavaScript event listeners by third-party trackers for real-time keystroke interception on websites.<n>We find evidence that 38.52% websites installed third-party event listeners to intercept keystrokes, and that at least 3.18% websites transmitted intercepted information to a third-party server.
arXiv Detail & Related papers (2025-08-27T12:20:52Z)
SoK: Advances and Open Problems in Web Tracking [71.54586748169943]
Web tracking is a pervasive and opaque practice that enables personalized advertising, and conversion tracking.<n>Web tracking is undergoing a once-in-a-generation transformation driven by shifts in the advertising industry, the adoption of anti-tracking countermeasures by browsers, and the growing enforcement of emerging privacy regulations.<n>This Systematization of Knowledge (SoK) aims to consolidate and synthesize this wide-ranging research, offering a comprehensive overview of the technical mechanisms, countermeasures, and regulations that shape the modern and rapidly evolving web tracking landscape.
arXiv Detail & Related papers (2025-06-16T23:30:54Z)
Fingerprinting and Tracing Shadows: The Development and Impact of Browser Fingerprinting on Digital Privacy [55.2480439325792]
Browser fingerprinting is a growing technique for identifying and tracking users online without traditional methods like cookies. This paper gives an overview by examining the various fingerprinting techniques and analyzes the entropy and uniqueness of the collected data.
arXiv Detail & Related papers (2024-11-18T20:32:31Z)
How Unique is Whose Web Browser? The role of demographics in browser fingerprinting among US users [50.699390248359265]
Browser fingerprinting can be used to identify and track users across the Web, even without cookies. This technique and resulting privacy risks have been studied for over a decade. We provide a first-of-its-kind dataset to enable further research.
arXiv Detail & Related papers (2024-10-09T14:51:58Z)
PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action [54.11479432110771]
PrivacyLens is a novel framework designed to extend privacy-sensitive seeds into expressive vignettes and further into agent trajectories. We instantiate PrivacyLens with a collection of privacy norms grounded in privacy literature and crowdsourced seeds. State-of-the-art LMs, like GPT-4 and Llama-3-70B, leak sensitive information in 25.68% and 38.69% of cases, even when prompted with privacy-enhancing instructions.
arXiv Detail & Related papers (2024-08-29T17:58:38Z)
Understanding Privacy Norms through Web Forms [5.972457400484541]
We build a specialized crawler to discover web forms on 11,500 popular websites. We run it on 11,500 popular websites, and we create a dataset of 293K web forms. By analyzing the annotated dataset, we reveal common patterns of data collection practices.
arXiv Detail & Related papers (2024-08-29T07:11:09Z)
Differentially Private Data Release on Graphs: Inefficiencies and Unfairness [48.96399034594329]
This paper characterizes the impact of Differential Privacy on bias and unfairness in the context of releasing information about networks. We consider a network release problem where the network structure is known to all, but the weights on edges must be released privately. Our work provides theoretical foundations and empirical evidence into the bias and unfairness arising due to privacy in these networked decision problems.
arXiv Detail & Related papers (2024-08-08T08:37:37Z)
Collection, usage and privacy of mobility data in the enterprise and public administrations [55.2480439325792]
Security measures such as anonymization are needed to protect individuals' privacy. Within our study, we conducted expert interviews to gain insights into practices in the field. We survey privacy-enhancing methods in use, which generally do not comply with state-of-the-art standards of differential privacy.
arXiv Detail & Related papers (2024-07-04T08:29:27Z)
Characterizing Browser Fingerprinting and its Mitigations [0.0]
This work explores one of these tracking techniques: browser fingerprinting. We detail how browser fingerprinting works, how prevalent it is, and what defenses can mitigate it.
arXiv Detail & Related papers (2023-10-12T20:31:24Z)
A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing. Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data. Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z)
A Comparative Audit of Privacy Policies from Healthcare Organizations in USA, UK and India [19.45392112573428]
This paper presents a large-scale data-driven study to audit privacy policies from healthcare organizations in USA, UK, and India. First, we collected the privacy policies of thousands of healthcare organizations in these countries and cleaned this privacy policy data using a clustering-based mixed-method technique. Second, we adopted a summarization-based technique to uncover exact broad data practices across countries and notice important differences.
arXiv Detail & Related papers (2023-06-20T14:21:37Z)
Privacy Policies over Time: Curation and Analysis of a Million-Document Dataset [6.060757543617328]
We develop a crawler that discovers, downloads, and extracts archived privacy policies from the Internet Archive's Wayback Machine. We curated a dataset of 1,071,488 English language privacy policies, spanning over two decades and over 130,000 distinct websites. Our data indicate that self-regulation for first-party websites has stagnated, while self-regulation for third parties has increased but is dominated by online advertising trade associations.
arXiv Detail & Related papers (2020-08-20T19:00:37Z)
Stop Tracking Me Bro! Differential Tracking Of User Demographics On Hyper-partisan Websites [5.690539268996364]
We take a first step to shed light and measure potential differences in tracking imposed on users when visiting specific party-line's websites. This methodology allows us to create user personas with specific attributes like gender and age and automate their browsing behavior. We test 9 personas on 556 hyper-partisan websites and find that right-leaning websites tend to track users more intensely than left-leaning.
arXiv Detail & Related papers (2020-02-03T18:35:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.