Cross-codex Learning for Reliable Scribe Identification in Medieval
Manuscripts
- URL: http://arxiv.org/abs/2312.04296v1
- Date: Thu, 7 Dec 2023 13:40:20 GMT
- Title: Cross-codex Learning for Reliable Scribe Identification in Medieval
Manuscripts
- Authors: Julius Wei{\ss}mann, Markus Seidl, Anya Dietrich, Martin Haltrich
- Abstract summary: We demonstrate the importance of cross-codex training data for CNN based text-independent off-line scribe identification.
We trained different neural networks on our complex data, validating time and accuracy differences in order to define the most reliable network architecture.
We present the results on our large scale open source dataset -- the Codex Claustroneoburgensis database (CCl-DB)
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Historic scribe identification is a substantial task for obtaining
information about the past. Uniform script styles, such as the Carolingian
minuscule, make it a difficult task for classification to focus on meaningful
features. Therefore, we demonstrate in this paper the importance of cross-codex
training data for CNN based text-independent off-line scribe identification, to
overcome codex dependent overfitting. We report three main findings: First, we
found that preprocessing with masked grayscale images instead of RGB images
clearly increased the F1-score of the classification results. Second, we
trained different neural networks on our complex data, validating time and
accuracy differences in order to define the most reliable network architecture.
With AlexNet, the network with the best trade-off between F1-score and time, we
achieved for individual classes F1-scores of up to 0,96 on line level and up to
1.0 on page level in classification. Third, we could replicate the finding that
the CNN output can be further improved by implementing a reject option, giving
more stable results. We present the results on our large scale open source
dataset -- the Codex Claustroneoburgensis database (CCl-DB) -- containing a
significant number of writings from different scribes in several codices. We
demonstrate for the first time on a dataset with such a variety of codices that
paleographic decisions can be reproduced automatically and precisely with CNNs.
This gives manifold new and fast possibilities for paleographers to gain
insights into unlabeled material, but also to develop further hypotheses.
Related papers
- Fuzzy Convolution Neural Networks for Tabular Data Classification [0.0]
Convolutional neural networks (CNNs) have attracted a great deal of attention due to their remarkable performance in various domains.
In this paper, we propose a novel framework fuzzy convolution neural network (FCNN) tailored specifically for tabular data.
arXiv Detail & Related papers (2024-06-04T20:33:35Z) - Analyzing Vietnamese Legal Questions Using Deep Neural Networks with
Biaffine Classifiers [3.116035935327534]
We propose using deep neural networks to extract important information from Vietnamese legal questions.
Given a legal question in natural language, the goal is to extract all the segments that contain the needed information to answer the question.
arXiv Detail & Related papers (2023-04-27T18:19:24Z) - A semantic hierarchical graph neural network for text classification [1.439766998338892]
We propose a new hierarchical graph neural network (HieGNN) which extracts corresponding information from word-level, sentence-level and document-level respectively.
Experimental results on several benchmark datasets achieve better or similar results compared to several baseline methods.
arXiv Detail & Related papers (2022-09-15T03:59:31Z) - Improving Model Training via Self-learned Label Representations [5.969349640156469]
We show that more sophisticated label representations are better for classification than the usual one-hot encoding.
We propose Learning with Adaptive Labels (LwAL) algorithm, which simultaneously learns the label representation while training for the classification task.
Our algorithm introduces negligible additional parameters and has a minimal computational overhead.
arXiv Detail & Related papers (2022-09-09T21:10:43Z) - Avoiding Overfitting: A Survey on Regularization Methods for
Convolutional Neural Networks [0.0]
Several image processing tasks have been significantly improved using Convolutional Neural Networks (CNN)
A critical factor in training concerns the network's regularization, which prevents the structure from overfitting.
This work analyzes several regularization methods developed in the last few years, showing significant improvements for different CNN models.
arXiv Detail & Related papers (2022-01-10T11:54:06Z) - CvS: Classification via Segmentation For Small Datasets [52.821178654631254]
This paper presents CvS, a cost-effective classifier for small datasets that derives the classification labels from predicting the segmentation maps.
We evaluate the effectiveness of our framework on diverse problems showing that CvS is able to achieve much higher classification results compared to previous methods when given only a handful of examples.
arXiv Detail & Related papers (2021-10-29T18:41:15Z) - Calibrating Class Activation Maps for Long-Tailed Visual Recognition [60.77124328049557]
We present two effective modifications of CNNs to improve network learning from long-tailed distribution.
First, we present a Class Activation Map (CAMC) module to improve the learning and prediction of network classifiers.
Second, we investigate the use of normalized classifiers for representation learning in long-tailed problems.
arXiv Detail & Related papers (2021-08-29T05:45:03Z) - HAT: Hierarchical Aggregation Transformers for Person Re-identification [87.02828084991062]
We take advantages of both CNNs and Transformers for image-based person Re-ID with high performance.
Work is the first to take advantages of both CNNs and Transformers for image-based person Re-ID.
arXiv Detail & Related papers (2021-07-13T09:34:54Z) - Fusion of CNNs and statistical indicators to improve image
classification [65.51757376525798]
Convolutional Networks have dominated the field of computer vision for the last ten years.
Main strategy to prolong this trend relies on further upscaling networks in size.
We hypothesise that adding heterogeneous sources of information may be more cost-effective to a CNN than building a bigger network.
arXiv Detail & Related papers (2020-12-20T23:24:31Z) - A Systematic Evaluation: Fine-Grained CNN vs. Traditional CNN
Classifiers [54.996358399108566]
We investigate the performance of the landmark general CNN classifiers, which presented top-notch results on large scale classification datasets.
We compare it against state-of-the-art fine-grained classifiers.
We show an extensive evaluation on six datasets to determine whether the fine-grained classifier is able to elevate the baseline in their experiments.
arXiv Detail & Related papers (2020-03-24T23:49:14Z) - 3D medical image segmentation with labeled and unlabeled data using
autoencoders at the example of liver segmentation in CT images [58.720142291102135]
This work investigates the potential of autoencoder-extracted features to improve segmentation with a convolutional neural network.
A convolutional autoencoder was used to extract features from unlabeled data and a multi-scale, fully convolutional CNN was used to perform the target task of 3D liver segmentation in CT images.
arXiv Detail & Related papers (2020-03-17T20:20:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.