Augmenting Rule-based DNS Censorship Detection at Scale with Machine
Learning
- URL: http://arxiv.org/abs/2302.02031v2
- Date: Thu, 15 Jun 2023 20:52:14 GMT
- Title: Augmenting Rule-based DNS Censorship Detection at Scale with Machine
Learning
- Authors: Jacob Brown, Xi Jiang, Van Tran, Arjun Nitin Bhagoji, Nguyen Phong
Hoang, Nick Feamster, Prateek Mittal, Vinod Yegneswaran
- Abstract summary: Censorship of the domain name system (DNS) is a key mechanism used across different countries.
In this paper, we explore how machine learning (ML) models can help streamline the detection process.
We find that unsupervised models, trained solely on uncensored instances, can identify new instances and variations of censorship missed by existing probes.
- Score: 38.00013408742201
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The proliferation of global censorship has led to the development of a
plethora of measurement platforms to monitor and expose it. Censorship of the
domain name system (DNS) is a key mechanism used across different countries. It
is currently detected by applying heuristics to samples of DNS queries and
responses (probes) for specific destinations. These heuristics, however, are
both platform-specific and have been found to be brittle when censors change
their blocking behavior, necessitating a more reliable automated process for
detecting censorship.
In this paper, we explore how machine learning (ML) models can (1) help
streamline the detection process, (2) improve the potential of using
large-scale datasets for censorship detection, and (3) discover new censorship
instances and blocking signatures missed by existing heuristic methods. Our
study shows that supervised models, trained using expert-derived labels on
instances of known anomalies and possible censorship, can learn the detection
heuristics employed by different measurement platforms. More crucially, we find
that unsupervised models, trained solely on uncensored instances, can identify
new instances and variations of censorship missed by existing heuristics.
Moreover, both methods demonstrate the capability to uncover a substantial
number of new DNS blocking signatures, i.e., injected fake IP addresses
overlooked by existing heuristics. These results are underpinned by an
important methodological finding: comparing the outputs of models trained using
the same probes but with labels arising from independent processes allows us to
more reliably detect cases of censorship in the absence of ground-truth labels
of censorship.
Related papers
- Understanding Routing-Induced Censorship Changes Globally [5.79183660559872]
We investigate the extent to which Equal-cost Multi-path (ECMP) routing is the cause for inconsistencies in censorship results.
We find ECMP routing significantly changes observed censorship across protocols, censor mechanisms, and in 17 countries.
Our work points to methods for improving future studies, reducing inconsistencies and increasing repeatability.
arXiv Detail & Related papers (2024-06-27T16:21:31Z) - Lazy Layers to Make Fine-Tuned Diffusion Models More Traceable [70.77600345240867]
A novel arbitrary-in-arbitrary-out (AIAO) strategy makes watermarks resilient to fine-tuning-based removal.
Unlike the existing methods of designing a backdoor for the input/output space of diffusion models, in our method, we propose to embed the backdoor into the feature space of sampled subpaths.
Our empirical studies on the MS-COCO, AFHQ, LSUN, CUB-200, and DreamBooth datasets confirm the robustness of AIAO.
arXiv Detail & Related papers (2024-05-01T12:03:39Z) - Amoeba: Circumventing ML-supported Network Censorship via Adversarial
Reinforcement Learning [8.788469979827484]
Recent advances in machine learning enable detecting a range of anti-censorship systems by learning distinct statistical patterns hidden in traffic flows.
In this paper, we formulate a practical adversarial attack strategy against flow classifiers as a method for circumventing censorship.
We show that Amoeba can effectively shape adversarial flows that have on average 94% attack success rate against a range of ML algorithms.
arXiv Detail & Related papers (2023-10-31T14:01:24Z) - Are Existing Out-Of-Distribution Techniques Suitable for Network
Intrusion Detection? [1.6317061277457001]
We investigate whether existing OOD detectors from other fields allow the identification of unknown malicious traffic.
We also explore whether more discriminative and semantically richer embedding spaces within models, such as those created with contrastive learning and multi-class tasks, benefit detection.
Our findings suggest that existing detectors can identify a consistent portion of new malicious traffic, and that improved embedding spaces enhance detection.
arXiv Detail & Related papers (2023-08-28T07:49:01Z) - Algorithmic Censoring in Dynamic Learning Systems [6.2952076725399975]
We formalize censoring, demonstrate how it can arise, and highlight difficulties in detection.
We consider safeguards against censoring - recourse and randomized-exploration.
The resulting techniques allow examples from censored groups to enter into the training data and correct the model.
arXiv Detail & Related papers (2023-05-15T21:42:22Z) - Verifying the Robustness of Automatic Credibility Assessment [50.55687778699995]
We show that meaning-preserving changes in input text can mislead the models.
We also introduce BODEGA: a benchmark for testing both victim models and attack methods on misinformation detection tasks.
Our experimental results show that modern large language models are often more vulnerable to attacks than previous, smaller solutions.
arXiv Detail & Related papers (2023-03-14T16:11:47Z) - Detecting Network-based Internet Censorship via Latent Feature
Representation Learning [4.862220550600935]
We design and evaluate a classification model based on latent feature representation learning and an image-based classification model to detect network-based Internet censorship.
To infer latent feature representations from network reachability data, we propose a sequence-to-sequence autoencoder.
To estimate the probability of censorship events from the inferred latent features, we rely on a densely connected multi-layer neural network model.
arXiv Detail & Related papers (2022-09-12T11:16:26Z) - Mitigating the Mutual Error Amplification for Semi-Supervised Object
Detection [92.52505195585925]
We propose a Cross Teaching (CT) method, aiming to mitigate the mutual error amplification by introducing a rectification mechanism of pseudo labels.
In contrast to existing mutual teaching methods that directly treat predictions from other detectors as pseudo labels, we propose the Label Rectification Module (LRM)
arXiv Detail & Related papers (2022-01-26T03:34:57Z) - D-Unet: A Dual-encoder U-Net for Image Splicing Forgery Detection and
Localization [108.8592577019391]
Image splicing forgery detection is a global binary classification task that distinguishes the tampered and non-tampered regions by image fingerprints.
We propose a novel network called dual-encoder U-Net (D-Unet) for image splicing forgery detection, which employs an unfixed encoder and a fixed encoder.
In an experimental comparison study of D-Unet and state-of-the-art methods, D-Unet outperformed the other methods in image-level and pixel-level detection.
arXiv Detail & Related papers (2020-12-03T10:54:02Z) - Scalable Backdoor Detection in Neural Networks [61.39635364047679]
Deep learning models are vulnerable to Trojan attacks, where an attacker can install a backdoor during training time to make the resultant model misidentify samples contaminated with a small trigger patch.
We propose a novel trigger reverse-engineering based approach whose computational complexity does not scale with the number of labels, and is based on a measure that is both interpretable and universal across different network and patch types.
In experiments, we observe that our method achieves a perfect score in separating Trojaned models from pure models, which is an improvement over the current state-of-the art method.
arXiv Detail & Related papers (2020-06-10T04:12:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.