Differential Tracking Across Topical Webpages of Indian News Media
- URL: http://arxiv.org/abs/2103.04442v1
- Date: Sun, 7 Mar 2021 20:20:47 GMT
- Title: Differential Tracking Across Topical Webpages of Indian News Media
- Authors: Yash Vekaria, Vibhor Agarwal, Pushkal Agarwal, Sangeeta Mahapatra,
Sakthi Balan Muthiah, Nishanth Sastry, Nicolas Kourtellis
- Abstract summary: We propose a novel method for automatic extraction and categorization of Indian news topical subpages based on the details in their URLs.
We find differential user tracking among subpages, and between subpages and homepages.
embedded third-parties tend to track specific subpages simultaneously, revealing possible user profiling in action.
- Score: 3.721918008485747
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Online user privacy and tracking have been extensively studied in recent
years, especially due to privacy and personal data-related legislations in the
EU and the USA, such as the General Data Protection Regulation, ePrivacy
Regulation, and California Consumer Privacy Act. Research has revealed novel
tracking and personal identifiable information leakage methods that first- and
third-parties employ on websites around the world, as well as the intensity of
tracking performed on such websites. However, for the sake of scaling to cover
a large portion of the Web, most past studies focused on homepages of websites,
and did not look deeper into the tracking practices on their topical subpages.
The majority of studies focused on the Global North markets such as the EU and
the USA. Large markets such as India, which covers 20% of the world population
and has no explicit privacy laws, have not been studied in this regard.
We aim to address these gaps and focus on the following research questions:
Is tracking on topical subpages of Indian news websites different from their
homepage? Do third-party trackers prefer to track specific topics? How does
this preference compare to the similarity of content shown on these topical
subpages? To answer these questions, we propose a novel method for automatic
extraction and categorization of Indian news topical subpages based on the
details in their URLs. We study the identified topical subpages and compare
them with their homepages with respect to the intensity of cookie injection and
third-party embeddedness and type. We find differential user tracking among
subpages, and between subpages and homepages. We also find a preferential
attachment of third-party trackers to specific topics. Also, embedded
third-parties tend to track specific subpages simultaneously, revealing
possible user profiling in action.
Related papers
- How Unique is Whose Web Browser? The role of demographics in browser fingerprinting among US users [50.699390248359265]
Browser fingerprinting can be used to identify and track users across the Web, even without cookies.
This technique and resulting privacy risks have been studied for over a decade.
We provide a first-of-its-kind dataset to enable further research.
arXiv Detail & Related papers (2024-10-09T14:51:58Z) - PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action [54.11479432110771]
PrivacyLens is a novel framework designed to extend privacy-sensitive seeds into expressive vignettes and further into agent trajectories.
We instantiate PrivacyLens with a collection of privacy norms grounded in privacy literature and crowdsourced seeds.
State-of-the-art LMs, like GPT-4 and Llama-3-70B, leak sensitive information in 25.68% and 38.69% of cases, even when prompted with privacy-enhancing instructions.
arXiv Detail & Related papers (2024-08-29T17:58:38Z) - Understanding Privacy Norms through Web Forms [5.972457400484541]
We build a specialized crawler to discover web forms on 11,500 popular websites.
We run it on 11,500 popular websites, and we create a dataset of 293K web forms.
By analyzing the annotated dataset, we reveal common patterns of data collection practices.
arXiv Detail & Related papers (2024-08-29T07:11:09Z) - Differentially Private Data Release on Graphs: Inefficiencies and Unfairness [48.96399034594329]
This paper characterizes the impact of Differential Privacy on bias and unfairness in the context of releasing information about networks.
We consider a network release problem where the network structure is known to all, but the weights on edges must be released privately.
Our work provides theoretical foundations and empirical evidence into the bias and unfairness arising due to privacy in these networked decision problems.
arXiv Detail & Related papers (2024-08-08T08:37:37Z) - Collection, usage and privacy of mobility data in the enterprise and public administrations [55.2480439325792]
Security measures such as anonymization are needed to protect individuals' privacy.
Within our study, we conducted expert interviews to gain insights into practices in the field.
We survey privacy-enhancing methods in use, which generally do not comply with state-of-the-art standards of differential privacy.
arXiv Detail & Related papers (2024-07-04T08:29:27Z) - Characterizing Browser Fingerprinting and its Mitigations [0.0]
This work explores one of these tracking techniques: browser fingerprinting.
We detail how browser fingerprinting works, how prevalent it is, and what defenses can mitigate it.
arXiv Detail & Related papers (2023-10-12T20:31:24Z) - A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing.
Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data.
Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z) - A Comparative Audit of Privacy Policies from Healthcare Organizations in
USA, UK and India [19.45392112573428]
This paper presents a large-scale data-driven study to audit privacy policies from healthcare organizations in USA, UK, and India.
First, we collected the privacy policies of thousands of healthcare organizations in these countries and cleaned this privacy policy data using a clustering-based mixed-method technique.
Second, we adopted a summarization-based technique to uncover exact broad data practices across countries and notice important differences.
arXiv Detail & Related papers (2023-06-20T14:21:37Z) - How Do Input Attributes Impact the Privacy Loss in Differential Privacy? [55.492422758737575]
We study the connection between the per-subject norm in DP neural networks and individual privacy loss.
We introduce a novel metric termed the Privacy Loss-Input Susceptibility (PLIS) which allows one to apportion the subject's privacy loss to their input attributes.
arXiv Detail & Related papers (2022-11-18T11:39:03Z) - Privacy Policies over Time: Curation and Analysis of a Million-Document
Dataset [6.060757543617328]
We develop a crawler that discovers, downloads, and extracts archived privacy policies from the Internet Archive's Wayback Machine.
We curated a dataset of 1,071,488 English language privacy policies, spanning over two decades and over 130,000 distinct websites.
Our data indicate that self-regulation for first-party websites has stagnated, while self-regulation for third parties has increased but is dominated by online advertising trade associations.
arXiv Detail & Related papers (2020-08-20T19:00:37Z) - Stop Tracking Me Bro! Differential Tracking Of User Demographics On
Hyper-partisan Websites [5.690539268996364]
We take a first step to shed light and measure potential differences in tracking imposed on users when visiting specific party-line's websites.
This methodology allows us to create user personas with specific attributes like gender and age and automate their browsing behavior.
We test 9 personas on 556 hyper-partisan websites and find that right-leaning websites tend to track users more intensely than left-leaning.
arXiv Detail & Related papers (2020-02-03T18:35:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.