Related papers: Detecting and recognizing characters in Greek papyri with YOLOv8, DeiT and SimCLR

Detecting and recognizing characters in Greek papyri with YOLOv8, DeiT and SimCLR

URL: http://arxiv.org/abs/2401.12513v2
Date: Wed, 14 Feb 2024 01:40:52 GMT
Title: Detecting and recognizing characters in Greek papyri with YOLOv8, DeiT and SimCLR
Authors: Robert Turnbull and Evelyn Mannix
Abstract summary: This paper discusses our submission to the ICDAR 2023 Competition on Detection and Recognition of Greek Letters on Papyri' We used an ensemble of YOLOv8 models to detect and classify individual characters and employed two different approaches for refining the character predictions. Our submission won the recognition challenge with a mAP of 42.2%, and was runner-up in the detection challenge with a mean average precision (mAP) of 51.4%.
Score: 9.7902367664742
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Purpose: The capacity to isolate and recognize individual characters from facsimile images of papyrus manuscripts yields rich opportunities for digital analysis. For this reason the `ICDAR 2023 Competition on Detection and Recognition of Greek Letters on Papyri' was held as part of the 17th International Conference on Document Analysis and Recognition. This paper discusses our submission to the competition. Methods: We used an ensemble of YOLOv8 models to detect and classify individual characters and employed two different approaches for refining the character predictions, including a transformer based DeiT approach and a ResNet-50 model trained on a large corpus of unlabelled data using SimCLR, a self-supervised learning method. Results: Our submission won the recognition challenge with a mAP of 42.2%, and was runner-up in the detection challenge with a mean average precision (mAP) of 51.4%. At the more relaxed intersection over union threshold of 0.5, we achieved the highest mean average precision and mean average recall results for both detection and classification. Conclusion: The results demonstrate the potential for these techniques for automated character recognition on historical manuscripts. We ran the prediction pipeline on more than 4,500 images from the Oxyrhynchus Papyri to illustrate the utility of our approach, and we release the results publicly in multiple formats.

Related papers

Human Identification at a Distance: Challenges, Methods and Results on the Competition HID 2025 [70.29305328364755]
The International Competition on Human Identification at a Distance (HID) has been organized annually since 2020.<n>The best-performing method reached 94.2% accuracy, setting a new benchmark on this dataset.<n>We analyze key technical trends and outline potential directions for future research in gait recognition.
arXiv Detail & Related papers (2026-02-07T14:22:17Z)
PRISM: Phase-enhanced Radial-based Image Signature Mapping framework for fingerprinting AI-generated images [2.119461028150219]
We introduce PRISM, a scalable framework for fingerprinting AI-generated images.<n>We construct PRISM-36K, a novel dataset of 36,000 images generated by six text-to-image GAN- and diffusion-based models.<n> PRISM achieves an attribution accuracy of 92.04% on this dataset.
arXiv Detail & Related papers (2025-09-18T10:57:26Z)
SLRTP2025 Sign Language Production Challenge: Methodology, Results, and Future Work [87.9341538630949]
The first Sign Language Production Challenge was held as part of the third SLRTP Workshop at CVPR 2025.<n>The competition's aims are to evaluate architectures that translate from spoken language sentences to a sequence of skeleton poses.<n>This paper presents the challenge design and the winning methodologies.
arXiv Detail & Related papers (2025-08-09T11:57:33Z)
MultiHuman-Testbench: Benchmarking Image Generation for Multiple Humans [60.6183017400517]
We introduce MultiHuman-Testbench, a novel benchmark for rigorously evaluating generative models for multi-human generation.<n>The benchmark comprises 1800 samples, including carefully curated text prompts, describing a range of simple to complex human actions.<n>We propose a multi-faceted evaluation suite employing four key metrics to quantify face count, ID similarity, prompt alignment, and action detection.
arXiv Detail & Related papers (2025-06-25T23:00:57Z)
NTIRE 2025 challenge on Text to Image Generation Model Quality Assessment [146.76913448156176]
This paper reports on the NTIRE 2025 challenge on Text to Image (T2I) generation model quality assessment.<n>The aim of this challenge is to address the fine-grained quality assessment of text-to-image generation models.
arXiv Detail & Related papers (2025-05-22T07:12:36Z)
Person Recognition at Altitude and Range: Fusion of Face, Body Shape and Gait [70.00430652562012]
FarSight is an end-to-end system for person recognition that integrates biometric cues across face, gait, and body shape modalities.<n>FarSight incorporates novel algorithms across four core modules: multi-subject detection and tracking, recognition-aware video restoration, modality-specific biometric feature encoding, and quality-guided multi-modal fusion.
arXiv Detail & Related papers (2025-05-07T17:58:25Z)
FLIP Reasoning Challenge [20.706469085872516]
This paper introduces the FLIP dataset, a benchmark for evaluating AI reasoning capabilities based on human verification tasks. FLIP challenges present users with two orderings of 4 images, requiring them to identify the coherent one. Our experiments evaluate state-of-the-art models, leveraging both vision-language models (VLMs) and large language models (LLMs)
arXiv Detail & Related papers (2025-04-16T17:07:16Z)
Solution for OOD-CV Workshop SSB Challenge 2024 (Open-Set Recognition Track) [6.998958192483059]
The challenge required identifying whether a test sample belonged to the semantic classes of a classifier's training set. We proposed a hybrid approach, experimenting with the fusion of various post-hoc OOD detection techniques and different Test-Time Augmentation strategies. Our best-performing method combined Test-Time Augmentation with the post-hoc OOD techniques, achieving a strong balance between AUROC and FPR95 scores.
arXiv Detail & Related papers (2024-09-30T13:28:14Z)
Self-Training with Pseudo-Label Scorer for Aspect Sentiment Quad Prediction [54.23208041792073]
Aspect Sentiment Quad Prediction (ASQP) aims to predict all quads (aspect term, aspect category, opinion term, sentiment polarity) for a given review. A key challenge in the ASQP task is the scarcity of labeled data, which limits the performance of existing methods. We propose a self-training framework with a pseudo-label scorer, wherein a scorer assesses the match between reviews and their pseudo-labels.
arXiv Detail & Related papers (2024-06-26T05:30:21Z)
Whole-body Detection, Recognition and Identification at Altitude and Range [57.445372305202405]
We propose an end-to-end system evaluated on diverse datasets. Our approach involves pre-training the detector on common image datasets and fine-tuning it on BRIAR's complex videos and images. We conduct thorough evaluations under various conditions, such as different ranges and angles in indoor, outdoor, and aerial scenarios.
arXiv Detail & Related papers (2023-11-09T20:20:23Z)
Handwritten Stenography Recognition and the LION Dataset [0.0]
Stenographic domain knowledge is integrated by applying four different encoding methods. Test error rates are reduced significantly by combining stenography-specific target sequence encodings with pre-training and fine-tuning.
arXiv Detail & Related papers (2023-08-15T14:25:53Z)
EFaR 2023: Efficient Face Recognition Competition [51.77649060180531]
The paper presents the summary of the Efficient Face Recognition Competition (EFaR) held at the 2023 International Joint Conference on Biometrics (IJCB 2023) The competition received 17 submissions from 6 different teams. The submitted solutions are ranked based on a weighted score of the achieved verification accuracies on a diverse set of benchmarks, as well as the deployability given by the number of floating-point operations and model size.
arXiv Detail & Related papers (2023-08-08T09:58:22Z)
Improving CNN-based Person Re-identification using score Normalization [2.462953128215087]
This paper proposes a novel approach for PRe-ID, which combines a CNN based feature extraction method with Cross-view Quadratic Discriminant Analysis (XQDA) for metric learning. The proposed approach is tested on four challenging datasets, including VIPeR, GRID, CUHK01, VIPeR and PRID450S.
arXiv Detail & Related papers (2023-07-01T18:12:27Z)
PART: Pre-trained Authorship Representation Transformer [52.623051272843426]
Authors writing documents imprint identifying information within their texts.<n>Previous works use hand-crafted features or classification tasks to train their authorship models.<n>We propose a contrastively trained model fit to learn textbfauthorship embeddings instead of semantics.
arXiv Detail & Related papers (2022-09-30T11:08:39Z)
mc-BEiT: Multi-choice Discretization for Image BERT Pre-training [52.04866462439979]
Image BERT pre-training with masked image modeling (MIM) is a popular practice to cope with self-supervised representation learning. We introduce an improved BERT-style image pre-training method, namely mc-BEiT, which performs MIM proxy tasks towards eased and refined multi-choice training objectives.
arXiv Detail & Related papers (2022-03-29T09:08:18Z)
Twitter-COMMs: Detecting Climate, COVID, and Military Multimodal Misinformation [83.2079454464572]
This paper describes our approach to the Image-Text Inconsistency Detection challenge of the DARPA Semantic Forensics (SemaFor) Program. We collect Twitter-COMMs, a large-scale multimodal dataset with 884k tweets relevant to the topics of Climate Change, COVID-19, and Military Vehicles. We train our approach, based on the state-of-the-art CLIP model, leveraging automatically generated random and hard negatives.
arXiv Detail & Related papers (2021-12-16T03:37:20Z)
Rotation Invariance and Extensive Data Augmentation: a strategy for the Mitosis Domain Generalization (MIDOG) Challenge [1.52292571922932]
We present the strategy we applied to participate in the MIDOG 2021 competition. The purpose of the competition was to evaluate the generalization of solutions to images acquired with unseen target scanners. We propose a solution based on a combination of state-of-the-art deep learning methods.
arXiv Detail & Related papers (2021-09-02T10:09:02Z)
FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence [93.91751021370638]
Semi-supervised learning (SSL) provides an effective means of leveraging unlabeled data to improve a model's performance. In this paper, we demonstrate the power of a simple combination of two common SSL methods: consistency regularization and pseudo-labeling. Our algorithm, FixMatch, first generates pseudo-labels using the model's predictions on weakly-augmented unlabeled images.
arXiv Detail & Related papers (2020-01-21T18:32:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.