STAR: Semantic-Traffic Alignment and Retrieval for Zero-Shot HTTPS Website Fingerprinting
- URL: http://arxiv.org/abs/2512.17667v1
- Date: Fri, 19 Dec 2025 15:12:01 GMT
- Title: STAR: Semantic-Traffic Alignment and Retrieval for Zero-Shot HTTPS Website Fingerprinting
- Authors: Yifei Cheng, Yujia Zhu, Baiyang Li, Xinhao Deng, Yitong Cai, Yaochen Ren, Qingyun Liu,
- Abstract summary: STAR learns a joint embedding space for encrypted traffic traces and crawl-time logic profiles.<n>Experiments on 1,600 unseen websites show STAR 87.9 percent top-1 accuracy and 0.963 AUC in open-world detection.
- Score: 8.545240172279017
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern HTTPS mechanisms such as Encrypted Client Hello (ECH) and encrypted DNS improve privacy but remain vulnerable to website fingerprinting (WF) attacks, where adversaries infer visited sites from encrypted traffic patterns. Existing WF methods rely on supervised learning with site-specific labeled traces, which limits scalability and fails to handle previously unseen websites. We address these limitations by reformulating WF as a zero-shot cross-modal retrieval problem and introducing STAR. STAR learns a joint embedding space for encrypted traffic traces and crawl-time logic profiles using a dual-encoder architecture. Trained on 150K automatically collected traffic-logic pairs with contrastive and consistency objectives and structure-aware augmentation, STAR retrieves the most semantically aligned profile for a trace without requiring target-side traffic during training. Experiments on 1,600 unseen websites show that STAR achieves 87.9 percent top-1 accuracy and 0.963 AUC in open-world detection, outperforming supervised and few-shot baselines. Adding an adapter with only four labeled traces per site further boosts top-5 accuracy to 98.8 percent. Our analysis reveals intrinsic semantic-traffic alignment in modern web protocols, identifying semantic leakage as the dominant privacy risk in encrypted HTTPS traffic. We release STAR's datasets and code to support reproducibility and future research.
Related papers
- The Trojan Knowledge: Bypassing Commercial LLM Guardrails via Harmless Prompt Weaving and Adaptive Tree Search [58.8834056209347]
Large language models (LLMs) remain vulnerable to jailbreak attacks that bypass safety guardrails to elicit harmful outputs.<n>We introduce the Correlated Knowledge Attack Agent (CKA-Agent), a dynamic framework that reframes jailbreaking as an adaptive, tree-structured exploration of the target model's knowledge base.
arXiv Detail & Related papers (2025-12-01T07:05:23Z) - Assimilation Matters: Model-level Backdoor Detection in Vision-Language Pretrained Models [71.44858461725893]
Given a model fine-tuned by an untrusted third party, determining whether the model has been injected with a backdoor is a critical and challenging problem.<n>Existing detection methods usually rely on prior knowledge of training dataset, backdoor triggers and targets.<n>We introduce Assimilation Matters in DETection (AMDET), a novel model-level detection framework that operates without any such prior knowledge.
arXiv Detail & Related papers (2025-11-29T06:20:00Z) - Defending against Indirect Prompt Injection by Instruction Detection [109.30156975159561]
InstructDetector is a novel detection-based approach that leverages the behavioral states of LLMs to identify potential IPI attacks.<n>InstructDetector achieves a detection accuracy of 99.60% in the in-domain setting and 96.90% in the out-of-domain setting, and reduces the attack success rate to just 0.03% on the BIPIA benchmark.
arXiv Detail & Related papers (2025-05-08T13:04:45Z) - Red Pill and Blue Pill: Controllable Website Fingerprinting Defense via Dynamic Backdoor Learning [93.44927301021688]
Website fingerprint (WF) attacks covertly monitor user communications to identify the web pages they visit.<n>Existing WF defenses attempt to reduce the attacker's accuracy by disrupting unique traffic patterns.<n>We introduce Controllable Website Fingerprint Defense (CWFD), a novel defense perspective based on backdoor learning.
arXiv Detail & Related papers (2024-12-16T06:12:56Z) - A Robust Multi-Stage Intrusion Detection System for In-Vehicle Network Security using Hierarchical Federated Learning [0.0]
In-vehicle intrusion detection systems (IDSs) must detect seen attacks and provide a robust defense against new, unseen attacks.
Previous work has relied solely on the CAN ID feature or has used traditional machine learning (ML) approaches with manual feature extraction.
This paper introduces a cutting-edge, novel, lightweight, in-vehicle, IDS-leveraging, deep learning (DL) algorithm to address these limitations.
arXiv Detail & Related papers (2024-08-15T21:51:56Z) - Seamless Website Fingerprinting in Multiple Environments [4.226243782049956]
Website fingerprinting (WF) attacks identify the websites visited over anonymized connections.
We introduce a new approach that classifies entire websites rather than individual web pages.
Our Convolutional Neural Network (CNN) uses only the jitter and size of 500 contiguous packets from any point in a TCP stream.
arXiv Detail & Related papers (2024-07-28T02:18:30Z) - Federated Learning for Zero-Day Attack Detection in 5G and Beyond V2X Networks [9.86830550255822]
Connected and Automated Vehicles (CAVs) on top of 5G and Beyond networks (5GB) make them vulnerable to increasing vectors of security and privacy attacks.
We propose in this paper a novel detection mechanism that leverages the ability of the deep auto-encoder method to detect attacks relying only on the benign network traffic pattern.
Using federated learning, the proposed intrusion detection system can be trained with large and diverse benign network traffic, while preserving the CAVs privacy, and minimizing the communication overhead.
arXiv Detail & Related papers (2024-07-03T12:42:31Z) - EmInspector: Combating Backdoor Attacks in Federated Self-Supervised Learning Through Embedding Inspection [53.25863925815954]
Federated self-supervised learning (FSSL) has emerged as a promising paradigm that enables the exploitation of clients' vast amounts of unlabeled data.
While FSSL offers advantages, its susceptibility to backdoor attacks has not been investigated.
We propose the Embedding Inspector (EmInspector) that detects malicious clients by inspecting the embedding space of local models.
arXiv Detail & Related papers (2024-05-21T06:14:49Z) - Detection of Malicious DNS-over-HTTPS Traffic: An Anomaly Detection Approach using Autoencoders [0.0]
We design an autoencoder that is capable of detecting malicious DNS traffic by only observing the encrypted DoH traffic.
We find that our proposed autoencoder achieves the highest detection performance, with a median F-1 score of 99% over several types of malicious traffic.
arXiv Detail & Related papers (2023-10-17T15:03:37Z) - An Adversarial Attack Analysis on Malicious Advertisement URL Detection
Framework [22.259444589459513]
Malicious advertisement URLs pose a security risk since they are the source of cyber-attacks.
Existing malicious URL detection techniques are limited and to handle unseen features as well as generalize to test data.
In this study, we extract a novel set of lexical and web-scrapped features and employ machine learning technique to set up system for fraudulent advertisement URLs detection.
arXiv Detail & Related papers (2022-04-27T20:06:22Z) - Hidden Backdoor Attack against Semantic Segmentation Models [60.0327238844584]
The emphbackdoor attack intends to embed hidden backdoors in deep neural networks (DNNs) by poisoning training data.
We propose a novel attack paradigm, the emphfine-grained attack, where we treat the target label from the object-level instead of the image-level.
Experiments show that the proposed methods can successfully attack semantic segmentation models by poisoning only a small proportion of training data.
arXiv Detail & Related papers (2021-03-06T05:50:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.