PassTSL: Modeling Human-Created Passwords through Two-Stage Learning
- URL: http://arxiv.org/abs/2407.14145v1
- Date: Fri, 19 Jul 2024 09:23:30 GMT
- Title: PassTSL: Modeling Human-Created Passwords through Two-Stage Learning
- Authors: Yangde Wang, Haozhang Li, Weidong Qiu, Shujun Li, Peng Tang,
- Abstract summary: We propose PassTSL (modeling human-created Passwords through Two-Stage Learning), inspired by the popular pretraining-finetuning framework in NLP and deep learning (DL)
PassTSL outperforms five state-of-the-art (SOTA) password cracking methods on password guessing by a significant margin ranging from 4.11% to 64.69% at the maximum point.
Based on PassTSL, we also implemented a password strength meter (PSM), and our experiments showed that it was able to estimate password strength more accurately.
- Score: 7.287089766975719
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Textual passwords are still the most widely used user authentication mechanism. Due to the close connections between textual passwords and natural languages, advanced technologies in natural language processing (NLP) and machine learning (ML) could be used to model passwords for different purposes such as studying human password-creation behaviors and developing more advanced password cracking methods for informing better defence mechanisms. In this paper, we propose PassTSL (modeling human-created Passwords through Two-Stage Learning), inspired by the popular pretraining-finetuning framework in NLP and deep learning (DL). We report how different pretraining settings affected PassTSL and proved its effectiveness by applying it to six large leaked password databases. Experimental results showed that it outperforms five state-of-the-art (SOTA) password cracking methods on password guessing by a significant margin ranging from 4.11% to 64.69% at the maximum point. Based on PassTSL, we also implemented a password strength meter (PSM), and our experiments showed that it was able to estimate password strength more accurately, causing fewer unsafe errors (overestimating the password strength) than two other SOTA PSMs when they produce the same rate of safe errors (underestimating the password strength): a neural-network based method and zxcvbn. Furthermore, we explored multiple finetuning settings, and our evaluations showed that, even a small amount of additional training data, e.g., only 0.1% of the pretrained data, can lead to over 3% improvement in password guessing on average. We also proposed a heuristic approach to selecting finetuning passwords based on JS (Jensen-Shannon) divergence and experimental results validated its usefulness. In summary, our contributions demonstrate the potential and feasibility of applying advanced NLP and ML methods to password modeling and cracking.
Related papers
- Unlocking Memorization in Large Language Models with Dynamic Soft Prompting [66.54460367290146]
Large language models (LLMs) have revolutionized natural language processing (NLP) tasks such as summarization, question answering, and translation.
LLMs pose significant security risks due to their tendency to memorize training data, leading to potential privacy breaches and copyright infringement.
We propose a novel method for estimating LLM memorization using dynamic, prefix-dependent soft prompts.
arXiv Detail & Related papers (2024-09-20T18:56:32Z) - Nudging Users to Change Breached Passwords Using the Protection Motivation Theory [58.87688846800743]
We draw on the Protection Motivation Theory (PMT) to design nudges that encourage users to change breached passwords.
Our study contributes to PMT's application in security research and provides concrete design implications for improving compromised credential notifications.
arXiv Detail & Related papers (2024-05-24T07:51:15Z) - Search-based Ordered Password Generation of Autoregressive Neural Networks [0.0]
We build SOPGesGPT, a password guessing model based on GPT, using SOPG to generate passwords.
Compared with the most influential models OMEN, FLA, PassGAN, VAEPass, experiments show that SOPGesGPT is far ahead in terms of both effective rate and cover rate.
arXiv Detail & Related papers (2024-03-15T01:30:38Z) - PassViz: A Visualisation System for Analysing Leaked Passwords [2.2530496464901106]
PassViz is a command-line tool for visualising and analysing leaked passwords in a 2-D space.
We show how PassViz can be used to visually analyse different aspects of leaked passwords and to facilitate the discovery of previously unknown password patterns.
arXiv Detail & Related papers (2023-09-22T16:06:26Z) - PassGPT: Password Modeling and (Guided) Generation with Large Language
Models [59.11160990637616]
We present PassGPT, a large language model trained on password leaks for password generation.
We also introduce the concept of guided password generation, where we leverage PassGPT sampling procedure to generate passwords matching arbitrary constraints.
arXiv Detail & Related papers (2023-06-02T13:49:53Z) - Targeted Honeyword Generation with Language Models [5.165256397719443]
Honeywords are fictitious passwords inserted into databases to identify password breaches.
Major difficulty is how to produce honeywords that are difficult to distinguish from real passwords.
arXiv Detail & Related papers (2022-08-15T00:06:29Z) - LaMDA: Language Models for Dialog Applications [75.75051929981933]
LaMDA is a family of Transformer-based neural language models specialized for dialog.
Fine-tuning with annotated data and enabling the model to consult external knowledge sources can lead to significant improvements.
arXiv Detail & Related papers (2022-01-20T15:44:37Z) - Skeptic: Automatic, Justified and Privacy-Preserving Password Composition Policy Selection [44.040106718326605]
The choice of password composition policy to enforce on a password-protected system represents a critical security decision.
In practice, this choice is not usually rigorous or justifiable, with a tendency for system administrators to choose password composition policies based on intuition alone.
We propose a novel methodology that draws on password probability distributions constructed from large sets of real-world password data.
arXiv Detail & Related papers (2020-07-07T22:12:13Z) - Interpretable Probabilistic Password Strength Meters via Deep Learning [13.97315111128149]
We show that probabilistic password meters inherently own the capability of describing the latent relation occurring between password strength and password structure.
Unlike existing constructions, our method is free from any human bias, and, more importantly, its feedback has a probabilistic interpretation.
arXiv Detail & Related papers (2020-04-15T16:05:50Z) - Byte Pair Encoding is Suboptimal for Language Model Pretraining [49.30780227162387]
We analyze differences between unigram LM tokenization and byte-pair encoding (BPE)
We find that the unigram LM tokenization method matches or outperforms BPE across downstream tasks and two languages.
We hope that developers of future pretrained LMs will consider adopting the unigram LM method over the more prevalent BPE.
arXiv Detail & Related papers (2020-04-07T21:21:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.