Related papers: MAYA: Addressing Inconsistencies in Generative Password Guessing through a Unified Benchmark

MAYA: Addressing Inconsistencies in Generative Password Guessing through a Unified Benchmark

URL: http://arxiv.org/abs/2504.16651v1
Date: Wed, 23 Apr 2025 12:16:59 GMT
Title: MAYA: Addressing Inconsistencies in Generative Password Guessing through a Unified Benchmark
Authors: William Corrias, Fabio De Gaspari, Dorjan Hitaj, Luigi V. Mancini,
Abstract summary: We introduce MAYA, a unified, customizable, plug-and-play password benchmarking framework.<n> MAYA provides a standardized approach for evaluating generative password-guessing models.<n>We find sequential models consistently outperform other generative architectures and traditional password-guessing tools.
Score: 0.35998666903987897
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The rapid evolution of generative models has led to their integration across various fields, including password guessing, aiming to generate passwords that resemble human-created ones in complexity, structure, and patterns. Despite generative model's promise, inconsistencies in prior research and a lack of rigorous evaluation have hindered a comprehensive understanding of their true potential. In this paper, we introduce MAYA, a unified, customizable, plug-and-play password benchmarking framework. MAYA provides a standardized approach for evaluating generative password-guessing models through a rigorous set of advanced testing scenarios and a collection of eight real-life password datasets. Using MAYA, we comprehensively evaluate six state-of-the-art approaches, which have been re-implemented and adapted to ensure standardization, for a total of over 15,000 hours of computation. Our findings indicate that these models effectively capture different aspects of human password distribution and exhibit strong generalization capabilities. However, their effectiveness varies significantly with long and complex passwords. Through our evaluation, sequential models consistently outperform other generative architectures and traditional password-guessing tools, demonstrating unique capabilities in generating accurate and complex guesses. Moreover, models learn and generate different password distributions, enabling a multi-model attack that outperforms the best individual model. By releasing MAYA, we aim to foster further research, providing the community with a new tool to consistently and reliably benchmark password-generation techniques. Our framework is publicly available at https://github.com/williamcorrias/MAYA-Password-Benchmarking

Related papers

Password Strength Detection via Machine Learning: Analysis, Modeling, and Evaluation [0.8225825738565354]
This study introduces various methods for system password cracking, outlines password defense strategies, and discusses the application of machine learning in the realm of password security.<n>We extract multiple characteristics of passwords, including length, the number of digits, the number of uppercase and lowercase letters, and the number of special characters.
arXiv Detail & Related papers (2025-05-22T09:27:40Z)
Queries, Representation & Detection: The Next 100 Model Fingerprinting Schemes [4.993542259120313]
This paper introduces a systematic approach to the creation of model fingerprinting schemes and their evaluation benchmarks.<n>We identify $sim100$ previously unexplored QuRD combinations and gain insights into their performance.<n>Our approach reveals the need for more challenging benchmarks and a sound comparison with baselines.
arXiv Detail & Related papers (2024-12-17T15:41:36Z)
Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines [74.42485647685272]
We focus on Generative Masked Language Models (GMLMs) We train a model to fit conditional probabilities of the data distribution via masking, which are subsequently used as inputs to a Markov Chain to draw samples from the model. We adapt the T5 model for iteratively-refined parallel decoding, achieving 2-3x speedup in machine translation with minimal sacrifice in quality.
arXiv Detail & Related papers (2024-07-22T18:00:00Z)
MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration [98.18244218156492]
Large Language Models (LLMs) have significantly advanced natural language processing.<n>As their applications expand into multi-agent environments, there arises a need for a comprehensive evaluation framework.<n>This work introduces a novel competition-based benchmark framework to assess LLMs within multi-agent settings.
arXiv Detail & Related papers (2023-11-14T21:46:27Z)
Dictionary Attack on IMU-based Gait Authentication [2.204806197679781]
We present a novel adversarial model for authentication systems that use gait patterns recorded by the inertial measurement unit (IMU) built into smartphones. The attack idea is inspired by and named after the concept of a dictionary attack on knowledge (PIN or password) based authentication systems.
arXiv Detail & Related papers (2023-09-21T04:00:21Z)
PassGPT: Password Modeling and (Guided) Generation with Large Language Models [59.11160990637616]
We present PassGPT, a large language model trained on password leaks for password generation. We also introduce the concept of guided password generation, where we leverage PassGPT sampling procedure to generate passwords matching arbitrary constraints.
arXiv Detail & Related papers (2023-06-02T13:49:53Z)
CodeLMSec Benchmark: Systematically Evaluating and Finding Security Vulnerabilities in Black-Box Code Language Models [58.27254444280376]
Large language models (LLMs) for automatic code generation have achieved breakthroughs in several programming tasks. Training data for these models is usually collected from the Internet (e.g., from open-source repositories) and is likely to contain faults and security vulnerabilities. This unsanitized training data can cause the language models to learn these vulnerabilities and propagate them during the code generation procedure.
arXiv Detail & Related papers (2023-02-08T11:54:07Z)
Universal Neural-Cracking-Machines: Self-Configurable Password Models from Auxiliary Data [21.277402919534566]
"universal password model" is a password model that adapts its guessing strategy based on the target system. It exploits users' auxiliary information, such as email addresses, as a proxy signal to predict the underlying password distribution.
arXiv Detail & Related papers (2023-01-18T16:12:04Z)
Twist Decoding: Diverse Generators Guide Each Other [116.20780037268801]
We introduce Twist decoding, a simple and general inference algorithm that generates text while benefiting from diverse models. Our method does not assume the vocabulary, tokenization or even generation order is shared.
arXiv Detail & Related papers (2022-05-19T01:27:53Z)
Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models [86.02610674750345]
Adversarial GLUE (AdvGLUE) is a new multi-task benchmark to explore and evaluate the vulnerabilities of modern large-scale language models under various types of adversarial attacks. We apply 14 adversarial attack methods to GLUE tasks to construct AdvGLUE, which is further validated by humans for reliable annotations. All the language models and robust training methods we tested perform poorly on AdvGLUE, with scores lagging far behind the benign accuracy.
arXiv Detail & Related papers (2021-11-04T12:59:55Z)
PassFlow: Guessing Passwords with Generative Flows [1.1470070927586016]
We propose a flow-based generative model approach to password guessing. Flow-based models allow for precise log-likelihood optimization, which enables exact latent variable inference. We show that flow-based networks are able to accurately model the original passwords distribution.
arXiv Detail & Related papers (2021-05-13T09:50:36Z)
RobustBench: a standardized adversarial robustness benchmark [84.50044645539305]
Key challenge in benchmarking robustness is that its evaluation is often error-prone leading to robustness overestimation. We evaluate adversarial robustness with AutoAttack, an ensemble of white- and black-box attacks. We analyze the impact of robustness on the performance on distribution shifts, calibration, out-of-distribution detection, fairness, privacy leakage, smoothness, and transferability.
arXiv Detail & Related papers (2020-10-19T17:06:18Z)
Benchmarking Robustness of Machine Reading Comprehension Models [29.659586787812106]
We construct AdvRACE, a new model-agnostic benchmark for evaluating the robustness of MRC models under four different types of adversarial attacks. We show that state-of-the-art (SOTA) models are vulnerable to all of these attacks. We conclude that there is substantial room for building more robust MRC models and our benchmark can help motivate and measure progress in this area.
arXiv Detail & Related papers (2020-04-29T08:05:32Z)
AvgOut: A Simple Output-Probability Measure to Eliminate Dull Responses [97.50616524350123]
We build dialogue models that are dynamically aware of what utterances or tokens are dull without any feature-engineering. The first model, MinAvgOut, directly maximizes the diversity score through the output distributions of each batch. The second model, Label Fine-Tuning (LFT), prepends to the source sequence a label continuously scaled by the diversity score to control the diversity level. The third model, RL, adopts Reinforcement Learning and treats the diversity score as a reward signal.
arXiv Detail & Related papers (2020-01-15T18:32:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.