Detecting and recognizing characters in Greek papyri with YOLOv8, DeiT
and SimCLR
- URL: http://arxiv.org/abs/2401.12513v2
- Date: Wed, 14 Feb 2024 01:40:52 GMT
- Title: Detecting and recognizing characters in Greek papyri with YOLOv8, DeiT
and SimCLR
- Authors: Robert Turnbull and Evelyn Mannix
- Abstract summary: This paper discusses our submission to the ICDAR 2023 Competition on Detection and Recognition of Greek Letters on Papyri'
We used an ensemble of YOLOv8 models to detect and classify individual characters and employed two different approaches for refining the character predictions.
Our submission won the recognition challenge with a mAP of 42.2%, and was runner-up in the detection challenge with a mean average precision (mAP) of 51.4%.
- Score: 9.7902367664742
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Purpose: The capacity to isolate and recognize individual characters from
facsimile images of papyrus manuscripts yields rich opportunities for digital
analysis. For this reason the `ICDAR 2023 Competition on Detection and
Recognition of Greek Letters on Papyri' was held as part of the 17th
International Conference on Document Analysis and Recognition. This paper
discusses our submission to the competition.
Methods: We used an ensemble of YOLOv8 models to detect and classify
individual characters and employed two different approaches for refining the
character predictions, including a transformer based DeiT approach and a
ResNet-50 model trained on a large corpus of unlabelled data using SimCLR, a
self-supervised learning method.
Results: Our submission won the recognition challenge with a mAP of 42.2%,
and was runner-up in the detection challenge with a mean average precision
(mAP) of 51.4%. At the more relaxed intersection over union threshold of 0.5,
we achieved the highest mean average precision and mean average recall results
for both detection and classification.
Conclusion: The results demonstrate the potential for these techniques for
automated character recognition on historical manuscripts. We ran the
prediction pipeline on more than 4,500 images from the Oxyrhynchus Papyri to
illustrate the utility of our approach, and we release the results publicly in
multiple formats.
Related papers
- Solution for OOD-CV Workshop SSB Challenge 2024 (Open-Set Recognition Track) [6.998958192483059]
The challenge required identifying whether a test sample belonged to the semantic classes of a classifier's training set.
We proposed a hybrid approach, experimenting with the fusion of various post-hoc OOD detection techniques and different Test-Time Augmentation strategies.
Our best-performing method combined Test-Time Augmentation with the post-hoc OOD techniques, achieving a strong balance between AUROC and FPR95 scores.
arXiv Detail & Related papers (2024-09-30T13:28:14Z) - Self-Training with Pseudo-Label Scorer for Aspect Sentiment Quad Prediction [54.23208041792073]
Aspect Sentiment Quad Prediction (ASQP) aims to predict all quads (aspect term, aspect category, opinion term, sentiment polarity) for a given review.
A key challenge in the ASQP task is the scarcity of labeled data, which limits the performance of existing methods.
We propose a self-training framework with a pseudo-label scorer, wherein a scorer assesses the match between reviews and their pseudo-labels.
arXiv Detail & Related papers (2024-06-26T05:30:21Z) - Whole-body Detection, Recognition and Identification at Altitude and
Range [57.445372305202405]
We propose an end-to-end system evaluated on diverse datasets.
Our approach involves pre-training the detector on common image datasets and fine-tuning it on BRIAR's complex videos and images.
We conduct thorough evaluations under various conditions, such as different ranges and angles in indoor, outdoor, and aerial scenarios.
arXiv Detail & Related papers (2023-11-09T20:20:23Z) - Handwritten Stenography Recognition and the LION Dataset [0.0]
Stenographic domain knowledge is integrated by applying four different encoding methods.
Test error rates are reduced significantly by combining stenography-specific target sequence encodings with pre-training and fine-tuning.
arXiv Detail & Related papers (2023-08-15T14:25:53Z) - EFaR 2023: Efficient Face Recognition Competition [51.77649060180531]
The paper presents the summary of the Efficient Face Recognition Competition (EFaR) held at the 2023 International Joint Conference on Biometrics (IJCB 2023)
The competition received 17 submissions from 6 different teams.
The submitted solutions are ranked based on a weighted score of the achieved verification accuracies on a diverse set of benchmarks, as well as the deployability given by the number of floating-point operations and model size.
arXiv Detail & Related papers (2023-08-08T09:58:22Z) - Improving CNN-based Person Re-identification using score Normalization [2.462953128215087]
This paper proposes a novel approach for PRe-ID, which combines a CNN based feature extraction method with Cross-view Quadratic Discriminant Analysis (XQDA) for metric learning.
The proposed approach is tested on four challenging datasets, including VIPeR, GRID, CUHK01, VIPeR and PRID450S.
arXiv Detail & Related papers (2023-07-01T18:12:27Z) - mc-BEiT: Multi-choice Discretization for Image BERT Pre-training [52.04866462439979]
Image BERT pre-training with masked image modeling (MIM) is a popular practice to cope with self-supervised representation learning.
We introduce an improved BERT-style image pre-training method, namely mc-BEiT, which performs MIM proxy tasks towards eased and refined multi-choice training objectives.
arXiv Detail & Related papers (2022-03-29T09:08:18Z) - Twitter-COMMs: Detecting Climate, COVID, and Military Multimodal
Misinformation [83.2079454464572]
This paper describes our approach to the Image-Text Inconsistency Detection challenge of the DARPA Semantic Forensics (SemaFor) Program.
We collect Twitter-COMMs, a large-scale multimodal dataset with 884k tweets relevant to the topics of Climate Change, COVID-19, and Military Vehicles.
We train our approach, based on the state-of-the-art CLIP model, leveraging automatically generated random and hard negatives.
arXiv Detail & Related papers (2021-12-16T03:37:20Z) - Rotation Invariance and Extensive Data Augmentation: a strategy for the
Mitosis Domain Generalization (MIDOG) Challenge [1.52292571922932]
We present the strategy we applied to participate in the MIDOG 2021 competition.
The purpose of the competition was to evaluate the generalization of solutions to images acquired with unseen target scanners.
We propose a solution based on a combination of state-of-the-art deep learning methods.
arXiv Detail & Related papers (2021-09-02T10:09:02Z) - FixMatch: Simplifying Semi-Supervised Learning with Consistency and
Confidence [93.91751021370638]
Semi-supervised learning (SSL) provides an effective means of leveraging unlabeled data to improve a model's performance.
In this paper, we demonstrate the power of a simple combination of two common SSL methods: consistency regularization and pseudo-labeling.
Our algorithm, FixMatch, first generates pseudo-labels using the model's predictions on weakly-augmented unlabeled images.
arXiv Detail & Related papers (2020-01-21T18:32:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.