BeCAPTCHA-Type: Biometric Keystroke Data Generation for Improved Bot
Detection
- URL: http://arxiv.org/abs/2207.13394v3
- Date: Tue, 11 Apr 2023 15:36:42 GMT
- Title: BeCAPTCHA-Type: Biometric Keystroke Data Generation for Improved Bot
Detection
- Authors: Daniel DeAlcala and Aythami Morales and Ruben Tolosana and Alejandro
Acien and Julian Fierrez and Santiago Hernandez and Miguel A. Ferrer and
Moises Diaz
- Abstract summary: This work proposes a data driven learning model for the synthesis of keystroke biometric data.
The proposed method is compared with two statistical approaches based on Universal and User-dependent models.
Our experimental framework considers a dataset with 136 million keystroke events from 168 thousand subjects.
- Score: 63.447493500066045
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This work proposes a data driven learning model for the synthesis of
keystroke biometric data. The proposed method is compared with two statistical
approaches based on Universal and User-dependent models. These approaches are
validated on the bot detection task, using the keystroke synthetic data to
improve the training process of keystroke-based bot detection systems. Our
experimental framework considers a dataset with 136 million keystroke events
from 168 thousand subjects. We have analyzed the performance of the three
synthesis approaches through qualitative and quantitative experiments.
Different bot detectors are considered based on several supervised classifiers
(Support Vector Machine, Random Forest, Gaussian Naive Bayes and a Long
Short-Term Memory network) and a learning framework including human and
synthetic samples. The experiments demonstrate the realism of the synthetic
samples. The classification results suggest that in scenarios with large
labeled data, these synthetic samples can be detected with high accuracy.
However, in few-shot learning scenarios it represents an important challenge.
Furthermore, these results show the great potential of the presented models.
Related papers
- Maximizing the Potential of Synthetic Data: Insights from Random Matrix Theory [8.713796223707398]
We use random matrix theory to derive the performance of a binary classifier trained on a mix of real and synthetic data.
Our findings identify conditions where synthetic data could improve performance, focusing on the quality of the generative model and verification strategy.
arXiv Detail & Related papers (2024-10-11T16:09:27Z) - Image change detection with only a few samples [7.5780621370948635]
A major impediment of image change detection task is the lack of large annotated datasets covering a wide variety of scenes.
We propose using simple image processing methods for generating synthetic but informative datasets.
We then design an early fusion network based on object detection which could outperform the siamese neural network.
arXiv Detail & Related papers (2023-11-07T07:01:35Z) - Learning Defect Prediction from Unrealistic Data [57.53586547895278]
Pretrained models of code have become popular choices for code understanding and generation tasks.
Such models tend to be large and require commensurate volumes of training data.
It has become popular to train models with far larger but less realistic datasets, such as functions with artificially injected bugs.
Models trained on such data tend to only perform well on similar data, while underperforming on real world programs.
arXiv Detail & Related papers (2023-11-02T01:51:43Z) - Domain Adaptive Synapse Detection with Weak Point Annotations [63.97144211520869]
We present AdaSyn, a framework for domain adaptive synapse detection with weak point annotations.
In the WASPSYN challenge at I SBI 2023, our method ranks the 1st place.
arXiv Detail & Related papers (2023-08-31T05:05:53Z) - Value function estimation using conditional diffusion models for control [62.27184818047923]
We propose a simple algorithm called Diffused Value Function (DVF)
It learns a joint multi-step model of the environment-robot interaction dynamics using a diffusion model.
We show how DVF can be used to efficiently capture the state visitation measure for multiple controllers.
arXiv Detail & Related papers (2023-06-09T18:40:55Z) - Revisiting the Evaluation of Image Synthesis with GANs [55.72247435112475]
This study presents an empirical investigation into the evaluation of synthesis performance, with generative adversarial networks (GANs) as a representative of generative models.
In particular, we make in-depth analyses of various factors, including how to represent a data point in the representation space, how to calculate a fair distance using selected samples, and how many instances to use from each set.
arXiv Detail & Related papers (2023-04-04T17:54:32Z) - Domain Generalization via Ensemble Stacking for Face Presentation Attack
Detection [4.61143637299349]
Face Presentation Attack Detection (PAD) plays a pivotal role in securing face recognition systems against spoofing attacks.
This work proposes a comprehensive solution that combines synthetic data generation and deep ensemble learning.
Experimental results on four datasets demonstrate low half total error rates (HTERs) on three benchmark datasets.
arXiv Detail & Related papers (2023-01-05T16:44:36Z) - Synt++: Utilizing Imperfect Synthetic Data to Improve Speech Recognition [18.924716098922683]
Machine learning with synthetic data is not trivial due to the gap between the synthetic and the real data distributions.
We propose two novel techniques during training to mitigate the problems due to the distribution gap.
We show that these methods significantly improve the training of speech recognition models using synthetic data.
arXiv Detail & Related papers (2021-10-21T21:11:42Z) - Towards an Automatic Analysis of CHO-K1 Suspension Growth in
Microfluidic Single-cell Cultivation [63.94623495501023]
We propose a novel Machine Learning architecture, which allows us to infuse a neural deep network with human-powered abstraction on the level of data.
Specifically, we train a generative model simultaneously on natural and synthetic data, so that it learns a shared representation, from which a target variable, such as the cell count, can be reliably estimated.
arXiv Detail & Related papers (2020-10-20T08:36:51Z) - Evaluation of synthetic and experimental training data in supervised
machine learning applied to charge state detection of quantum dots [0.0]
We evaluate the prediction accuracy of a range of machine learning models trained on simulated and experimental data.
We find that classifiers perform best on either purely experimental or a combination of synthetic and experimental training data.
arXiv Detail & Related papers (2020-05-16T23:41:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.