Measuring and Evading Turkmenistan's Internet Censorship: A Case Study
in Large-Scale Measurements of a Low-Penetration Country
- URL: http://arxiv.org/abs/2304.04835v2
- Date: Mon, 17 Apr 2023 16:44:04 GMT
- Title: Measuring and Evading Turkmenistan's Internet Censorship: A Case Study
in Large-Scale Measurements of a Low-Penetration Country
- Authors: Sadia Nourin, Van Tran, Xi Jiang, Kevin Bock, Nick Feamster, Nguyen
Phong Hoang, Dave Levin
- Abstract summary: Turkmenistan has been listed as one of the few Internet enemies by Reporters without Borders.
With a population of only six million people and an Internet penetration rate of only 38%, it is challenging to conduct remote network measurements at scale.
We present the largest measurement study to date of Turkmenistan's Web censorship.
- Score: 16.32681366389081
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Since 2006, Turkmenistan has been listed as one of the few Internet enemies
by Reporters without Borders due to its extensively censored Internet and
strictly regulated information control policies. Existing reports of filtering
in Turkmenistan rely on a small number of vantage points or test a small number
of websites. Yet, the country's poor Internet adoption rates and small
population can make more comprehensive measurement challenging. With a
population of only six million people and an Internet penetration rate of only
38%, it is challenging to either recruit in-country volunteers or obtain
vantage points to conduct remote network measurements at scale.
We present the largest measurement study to date of Turkmenistan's Web
censorship. To do so, we developed TMC, which tests the blocking status of
millions of domains across the three foundational protocols of the Web (DNS,
HTTP, and HTTPS). Importantly, TMC does not require access to vantage points in
the country. We apply TMC to 15.5M domains, our results reveal that
Turkmenistan censors more than 122K domains, using different blocklists for
each protocol. We also reverse-engineer these censored domains, identifying 6K
over-blocking rules causing incidental filtering of more than 5.4M domains.
Finally, we use Geneva, an open-source censorship evasion tool, to discover
five new censorship evasion strategies that can defeat Turkmenistan's
censorship at both transport and application layers. We will publicly release
both the data collected by TMC and the code for censorship evasion.
Related papers
- Consent in Crisis: The Rapid Decline of the AI Data Commons [74.68176012363253]
General-purpose artificial intelligence (AI) systems are built on massive swathes of public web data.
We conduct the first, large-scale, longitudinal audit of the consent protocols for the web domains underlying AI training corpora.
arXiv Detail & Related papers (2024-07-20T16:50:18Z) - Automatic Generation of Web Censorship Probe Lists [6.051603326423421]
Previous efforts to generate domain probe lists have been mostly manual or crowdsourced.
This paper explores methods for automatically generating probe lists that are both comprehensive and up-to-date for Web censorship measurement.
arXiv Detail & Related papers (2024-07-11T05:04:52Z) - Pathfinder: Exploring Path Diversity for Assessing Internet Censorship Inconsistency [8.615061541238589]
We investigate Internet censorship from a different perspective by scrutinizing the diverse censorship deployment inside a country.
We reveal that the diversity of Internet censorship caused by different routing paths inside a country is prevalent.
We identify that different hosting platforms also result in inconsistent censorship activities due to different peering relationships with the ISPs in a country.
arXiv Detail & Related papers (2024-07-05T01:48:31Z) - Understanding Routing-Induced Censorship Changes Globally [5.79183660559872]
We investigate the extent to which Equal-cost Multi-path (ECMP) routing is the cause for inconsistencies in censorship results.
We find ECMP routing significantly changes observed censorship across protocols, censor mechanisms, and in 17 countries.
Our work points to methods for improving future studies, reducing inconsistencies and increasing repeatability.
arXiv Detail & Related papers (2024-06-27T16:21:31Z) - LLM Censorship: A Machine Learning Challenge or a Computer Security
Problem? [52.71988102039535]
We show that semantic censorship can be perceived as an undecidable problem.
We argue that the challenges extend beyond semantic censorship, as knowledgeable attackers can reconstruct impermissible outputs.
arXiv Detail & Related papers (2023-07-20T09:25:02Z) - Countering Malicious Content Moderation Evasion in Online Social
Networks: Simulation and Detection of Word Camouflage [64.78260098263489]
Twisting and camouflaging keywords are among the most used techniques to evade platform content moderation systems.
This article contributes significantly to countering malicious information by developing multilingual tools to simulate and detect new methods of evasion of content.
arXiv Detail & Related papers (2022-12-27T16:08:49Z) - How We Express Ourselves Freely: Censorship, Self-censorship, and
Anti-censorship on a Chinese Social Media [4.408128846525362]
We identify the metrics of censorship and self-censorship, find the influence factors, and construct a mediation model to measure their relationship.
Based on these findings, we discuss implications for democratic social media design and future censorship research.
arXiv Detail & Related papers (2022-11-24T18:28:16Z) - Manipulating Twitter Through Deletions [64.33261764633504]
Research into influence campaigns on Twitter has mostly relied on identifying malicious activities from tweets obtained via public APIs.
Here, we provide the first exhaustive, large-scale analysis of anomalous deletion patterns involving more than a billion deletions by over 11 million accounts.
We find that a small fraction of accounts delete a large number of tweets daily.
First, limits on tweet volume are circumvented, allowing certain accounts to flood the network with over 26 thousand daily tweets.
Second, coordinated networks of accounts engage in repetitive likes and unlikes of content that is eventually deleted, which can manipulate ranking algorithms.
arXiv Detail & Related papers (2022-03-25T20:07:08Z) - A Dataset of State-Censored Tweets [3.0254442724635173]
We release a dataset of 583,437 tweets by 155,715 users that were censored between 2012-2020 July.
We also release 4,301 accounts that were censored in their entirety.
Our dataset will not only aid in the study of government censorship but will also aid in studying hate speech detection and the effect of censorship on social media users.
arXiv Detail & Related papers (2021-01-15T00:18:27Z) - Political audience diversity and news reliability in algorithmic ranking [54.23273310155137]
We propose using the political diversity of a website's audience as a quality signal.
Using news source reliability ratings from domain experts and web browsing data from a diverse sample of 6,890 U.S. citizens, we first show that websites with more extreme and less politically diverse audiences have lower journalistic standards.
arXiv Detail & Related papers (2020-07-16T02:13:55Z) - Learning to Evaluate Perception Models Using Planner-Centric Metrics [104.33349410009161]
We propose a principled metric for 3D object detection specifically for the task of self-driving.
We find that our metric penalizes many of the mistakes that other metrics penalize by design.
For human evaluation, we generate scenes in which standard metrics and our metric disagree and find that humans side with our metric 79% of the time.
arXiv Detail & Related papers (2020-04-19T02:14:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.