Related papers: Benchmarking of Different YOLO Models for CAPTCHAs Detection and Classification

Benchmarking of Different YOLO Models for CAPTCHAs Detection and Classification

URL: http://arxiv.org/abs/2502.13740v1
Date: Wed, 19 Feb 2025 14:05:50 GMT
Title: Benchmarking of Different YOLO Models for CAPTCHAs Detection and Classification
Authors: Mikołaj Wysocki, Henryk Gierszal, Piotr Tyczka, Sophia Karagiorgou, George Pantelis,
Abstract summary: This paper provides an analysis and comparison of the YOLOv5, YOLOv8 and YOLOv10 models for webpage CAPTCHAs detection.<n>The study examines the nano (n), small (s), and medium (m) variants of YOLO architectures and use metrics such as Precision, Recall, F1 score, mAP@50 and inference speed to determine the real-life utility.
Score: 0.19972837513980318
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: This paper provides an analysis and comparison of the YOLOv5, YOLOv8 and YOLOv10 models for webpage CAPTCHAs detection using the datasets collected from the web and darknet as well as synthetized data of webpages. The study examines the nano (n), small (s), and medium (m) variants of YOLO architectures and use metrics such as Precision, Recall, F1 score, mAP@50 and inference speed to determine the real-life utility. Additionally, the possibility of tuning the trained model to detect new CAPTCHA patterns efficiently was examined as it is a crucial part of real-life applications. The image slicing method was proposed as a way to improve the metrics of detection on oversized input images which can be a common scenario in webpages analysis. Models in version nano achieved the best results in terms of speed, while more complexed architectures scored better in terms of other metrics.

Related papers

YOLO for Knowledge Extraction from Vehicle Images: A Baseline Study [0.20482269513546458]
This study evaluates the effectiveness of three state-of-the-art deep learning approaches YOLO-v11, YOLO-World, and YOLO- Classification.<n>This dataset was collected under challenging and unconstrained conditions by NSW Police Highway Patrol Vehicles.<n>It was concluded that there is a need to use MVI to get usable models within such complex real-world datasets.
arXiv Detail & Related papers (2025-07-25T05:31:21Z)
Small Object Detection with YOLO: A Performance Analysis Across Model Versions and Hardware [2.07180164747172]
This paper investigates speed and detection accuracy on Intel and CPUs using popular libraries such as ONNX and OpenVINO. We analyze the sensitivity of these YOLO models to object size within the image, examining performance when detecting objects that occupy 1%, 2.5%, and 5% of the total area of the image.
arXiv Detail & Related papers (2025-04-14T05:49:31Z)
Recursive Learning of Asymptotic Variational Objectives [49.69399307452126]
General state-space models (SSMs) are widely used in statistical machine learning and are among the most classical generative models for sequential time-series data. Online sequential IWAE (OSIWAE) allows for online learning of both model parameters and a Markovian recognition model for inferring latent states. This approach is more theoretically well-founded than recently proposed online variational SMC methods.
arXiv Detail & Related papers (2024-11-04T16:12:37Z)
YOLOv11: An Overview of the Key Architectural Enhancements [0.5639904484784127]
The paper explores YOLOv11's expanded capabilities across various computer vision tasks, including object detection, instance segmentation, pose estimation, and oriented object detection (OBB) We review the model's performance improvements in terms of mean Average Precision (mAP) and computational efficiency compared to its predecessors, with a focus on the trade-off between parameter count and accuracy. Our research provides insights into YOLOv11's position within the broader landscape of object detection and its potential impact on real-time computer vision applications.
arXiv Detail & Related papers (2024-10-23T09:55:22Z)
Replication Study and Benchmarking of Real-Time Object Detection Models [0.0]
We compare a variety of object detection models' accuracy and inference speed on multiple graphics cards. We propose a unified training and evaluation pipeline, based on MMDetection's features, to better compare models. Results exhibit a strong trade-off between accuracy and speed, prevailed by anchor-free models.
arXiv Detail & Related papers (2024-05-11T04:47:50Z)
YOLO-MS: Rethinking Multi-Scale Representation Learning for Real-time Object Detection [80.11152626362109]
We provide an efficient and performant object detector, termed YOLO-MS. We train our YOLO-MS on the MS COCO dataset from scratch without relying on any other large-scale datasets. Our work can also be used as a plug-and-play module for other YOLO models.
arXiv Detail & Related papers (2023-08-10T10:12:27Z)
YOLOBench: Benchmarking Efficient Object Detectors on Embedded Systems [0.0873811641236639]
We present YOLOBench, a benchmark comprised of 550+ YOLO-based object detection models on 4 different datasets and 4 different embedded hardware platforms. We collect accuracy and latency numbers for a variety of YOLO-based one-stage detectors at different model scales by performing a fair, controlled comparison of these detectors with a fixed training environment. We evaluate training-free accuracy estimators used in neural architecture search on YOLOBench and demonstrate that, while most state-of-the-art zero-cost accuracy estimators are outperformed by a simple baseline like MAC count, some of them can be effectively used to
arXiv Detail & Related papers (2023-07-26T01:51:10Z)
Incremental Online Learning Algorithms Comparison for Gesture and Visual Smart Sensors [68.8204255655161]
This paper compares four state-of-the-art algorithms in two real applications: gesture recognition based on accelerometer data and image classification. Our results confirm these systems' reliability and the feasibility of deploying them in tiny-memory MCUs.
arXiv Detail & Related papers (2022-09-01T17:05:20Z)
Hybridization of Capsule and LSTM Networks for unsupervised anomaly detection on multivariate data [0.0]
This paper introduces a novel NN architecture which hybridises the Long-Short-Term-Memory (LSTM) and Capsule Networks into a single network. The proposed method uses an unsupervised learning technique to overcome the issues with finding large volumes of labelled training data.
arXiv Detail & Related papers (2022-02-11T10:33:53Z)
Efficient Nearest Neighbor Language Models [114.40866461741795]
Non-parametric neural language models (NLMs) learn predictive distributions of text utilizing an external datastore. We show how to achieve up to a 6x speed-up in inference speed while retaining comparable performance.
arXiv Detail & Related papers (2021-09-09T12:32:28Z)
STAR: Sparse Transformer-based Action Recognition [61.490243467748314]
This work proposes a novel skeleton-based human action recognition model with sparse attention on the spatial dimension and segmented linear attention on the temporal dimension of data. Experiments show that our model can achieve comparable performance while utilizing much less trainable parameters and achieve high speed in training and inference.
arXiv Detail & Related papers (2021-07-15T02:53:11Z)
ALT-MAS: A Data-Efficient Framework for Active Testing of Machine Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data. The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z)
Firearm Detection via Convolutional Neural Networks: Comparing a Semantic Segmentation Model Against End-to-End Solutions [68.8204255655161]
Threat detection of weapons and aggressive behavior from live video can be used for rapid detection and prevention of potentially deadly incidents. One way for achieving this is through the use of artificial intelligence and, in particular, machine learning for image analysis. We compare a traditional monolithic end-to-end deep learning model and a previously proposed model based on an ensemble of simpler neural networks detecting fire-weapons via semantic segmentation.
arXiv Detail & Related papers (2020-12-17T15:19:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.