Related papers: MoPE: A Mixture of Password Experts for Improving Password Guessing

MoPE: A Mixture of Password Experts for Improving Password Guessing

URL: http://arxiv.org/abs/2509.16558v1
Date: Sat, 20 Sep 2025 07:30:15 GMT
Title: MoPE: A Mixture of Password Experts for Improving Password Guessing
Authors: Mingjian Duan, Ming Xu, Shenghao Zhang, Jiaheng Zhang, Weili Han,
Abstract summary: We propose MoPE, specifically designed to leverage the structural patterns in passwords to improveguessing performance.<n>Our evaluation shows that MoPE significantly outperforms existing state-of-the-art baselines in both offline and online guessing scenarios.
Score: 10.399922446362417
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Textual passwords remain a predominant authentication mechanism in web security. To evaluate their strength, existing research has proposed several data-driven models across various scenarios. However, these models generally treat passwords uniformly, neglecting the structural differences among passwords. This typically results in biased training that favors frequent password structural patterns. To mitigate the biased training, we argue that passwords, as a type of complex short textual data, should be processed in a structure-aware manner by identifying their structural patterns and routing them to specialized models accordingly. In this paper, we propose MoPE, a Mixture of Password Experts framework, specifically designed to leverage the structural patterns in passwords to improveguessing performance. Motivated by the observation that passwords with similar structural patterns (e.g., fixed-length numeric strings) tend to cluster in high-density regions within the latent space, our MoPE introduces: (1) a novel structure-based method for generating specialized expert models; (2) a lightweight gate method to select appropriate expert models to output reliable guesses, better aligned with the high computational frequency of password guessing tasks. Our evaluation shows that MoPE significantly outperforms existing state-of-the-art baselines in both offline and online guessing scenarios, achieving up to 38.80% and 9.27% improvement in cracking rate, respectively, showcasing that MoPE can effectively exploit the capabilities of data-driven models for password guessing. Additionally, we implement a real-time Password Strength Meter (PSM) based on offline MoPE, assisting users in choosing stronger passwords more precisely with millisecond-level response latency.

Related papers

Enhancing Password Security Through a High-Accuracy Scoring Framework Using Random Forests [0.5097809301149341]
We implement and evaluate a password strength scoring system by comparing four machine learning models.<n>Our primary contribution is a novel hybrid feature engineering approach that captures nuanced vulnerabilities missed by standard metrics.
arXiv Detail & Related papers (2025-11-12T17:05:27Z)
KAPG: Adaptive Password Guessing via Knowledge-Augmented Generation [7.1409672981861485]
We propose a knowledge-augmented password guessing framework that integrates external lexical knowledge into the guessing process.<n>KnowGuess achieves average improvements of 36.5% and 74.7% over state-of-the-art models in intra-site and cross-site scenarios.<n>We also develop KAPSM, a trend-aware and site-specific password strength meter.
arXiv Detail & Related papers (2025-10-27T06:03:08Z)
RouteMark: A Fingerprint for Intellectual Property Attribution in Routing-based Model Merging [69.2230254959204]
We propose RouteMark, a framework for IP protection in merged MoE models.<n>Our key insight is that task-specific experts exhibit stable and distinctive routing behaviors under probing inputs.<n>For attribution and tampering detection, we introduce a similarity-based matching algorithm.
arXiv Detail & Related papers (2025-08-03T14:51:58Z)
Password Strength Detection via Machine Learning: Analysis, Modeling, and Evaluation [0.8225825738565354]
This study introduces various methods for system password cracking, outlines password defense strategies, and discusses the application of machine learning in the realm of password security.<n>We extract multiple characteristics of passwords, including length, the number of digits, the number of uppercase and lowercase letters, and the number of special characters.
arXiv Detail & Related papers (2025-05-22T09:27:40Z)
MAYA: Addressing Inconsistencies in Generative Password Guessing through a Unified Benchmark [1.4419466020986265]
This paper introduces MAYA, a unified, customizable, plug-and-play benchmarking framework for generative password-guessing models.<n>We conduct a comprehensive assessment of six state-of-the-art approaches, which we re-implemented and adapted to ensure standardization.<n>Our findings indicate that these models effectively capture different aspects of human password distribution and exhibit strong generalization capabilities.
arXiv Detail & Related papers (2025-04-23T12:16:59Z)
Detecting Document-level Paraphrased Machine Generated Content: Mimicking Human Writing Style and Involving Discourse Features [57.34477506004105]
Machine-generated content poses challenges such as academic plagiarism and the spread of misinformation.<n>We introduce novel methodologies and datasets to overcome these challenges.<n>We propose MhBART, an encoder-decoder model designed to emulate human writing style.<n>We also propose DTransformer, a model that integrates discourse analysis through PDTB preprocessing to encode structural features.
arXiv Detail & Related papers (2024-12-17T08:47:41Z)
Robust Utility-Preserving Text Anonymization Based on Large Language Models [80.5266278002083]
Anonymizing text that contains sensitive information is crucial for a wide range of applications.<n>Existing techniques face the emerging challenges of the re-identification ability of large language models.<n>We propose a framework composed of three key components: a privacy evaluator, a utility evaluator, and an optimization component.
arXiv Detail & Related papers (2024-07-16T14:28:56Z)
FLIP: Fine-grained Alignment between ID-based Models and Pretrained Language Models for CTR Prediction [49.510163437116645]
Click-through rate (CTR) prediction plays as a core function module in personalized online services. Traditional ID-based models for CTR prediction take as inputs the one-hot encoded ID features of tabular modality. Pretrained Language Models(PLMs) has given rise to another paradigm, which takes as inputs the sentences of textual modality. We propose to conduct Fine-grained feature-level ALignment between ID-based Models and Pretrained Language Models(FLIP) for CTR prediction.
arXiv Detail & Related papers (2023-10-30T11:25:03Z)
PassGPT: Password Modeling and (Guided) Generation with Large Language Models [59.11160990637616]
We present PassGPT, a large language model trained on password leaks for password generation. We also introduce the concept of guided password generation, where we leverage PassGPT sampling procedure to generate passwords matching arbitrary constraints.
arXiv Detail & Related papers (2023-06-02T13:49:53Z)
Pre-trained Language Models for Keyphrase Generation: A Thorough Empirical Study [76.52997424694767]
We present an in-depth empirical study of keyphrase extraction and keyphrase generation using pre-trained language models. We show that PLMs have competitive high-resource performance and state-of-the-art low-resource performance. Further results show that in-domain BERT-like PLMs can be used to build strong and data-efficient keyphrase generation models.
arXiv Detail & Related papers (2022-12-20T13:20:21Z)
ConTextual Mask Auto-Encoder for Dense Passage Retrieval [49.49460769701308]
CoT-MAE is a simple yet effective generative pre-training method for dense passage retrieval. It learns to compress the sentence semantics into a dense vector through self-supervised and context-supervised masked auto-encoding. We conduct experiments on large-scale passage retrieval benchmarks and show considerable improvements over strong baselines.
arXiv Detail & Related papers (2022-08-16T11:17:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.