Contrastive Learning for Character Detection in Ancient Greek Papyri
- URL: http://arxiv.org/abs/2409.10156v1
- Date: Mon, 16 Sep 2024 10:41:29 GMT
- Title: Contrastive Learning for Character Detection in Ancient Greek Papyri
- Authors: Vedasri Nakka, Andreas Fischer, Rolf Ingold, Lars Vogtlin,
- Abstract summary: This thesis investigates the effectiveness of SimCLR, a contrastive learning technique, in Greek letter recognition.
Pretraining of SimCLR is conducted on the Alpub dataset, followed by fine-tuning on the ICDAR dataset.
Our experiments show that SimCLR does not outperform the baselines in letter recognition tasks.
- Score: 0.6361669177741777
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: This thesis investigates the effectiveness of SimCLR, a contrastive learning technique, in Greek letter recognition, focusing on the impact of various augmentation techniques. We pretrain the SimCLR backbone using the Alpub dataset (pretraining dataset) and fine-tune it on a smaller ICDAR dataset (finetuning dataset) to compare SimCLR's performance against traditional baseline models, which use cross-entropy and triplet loss functions. Additionally, we explore the role of different data augmentation strategies, essential for the SimCLR training process. Methodologically, we examine three primary approaches: (1) a baseline model using cross-entropy loss, (2) a triplet embedding model with a classification layer, and (3) a SimCLR pretrained model with a classification layer. Initially, we train the baseline, triplet, and SimCLR models using 93 augmentations on ResNet-18 and ResNet-50 networks with the ICDAR dataset. From these, the top four augmentations are selected using a statistical t-test. Pretraining of SimCLR is conducted on the Alpub dataset, followed by fine-tuning on the ICDAR dataset. The triplet loss model undergoes a similar process, being pretrained on the top four augmentations before fine-tuning on ICDAR. Our experiments show that SimCLR does not outperform the baselines in letter recognition tasks. The baseline model with cross-entropy loss demonstrates better performance than both SimCLR and the triplet loss model. This study provides a detailed evaluation of contrastive learning for letter recognition, highlighting SimCLR's limitations while emphasizing the strengths of traditional supervised learning models in this task. We believe SimCLR's cropping strategies may cause a semantic shift in the input image, reducing training effectiveness despite the large pretraining dataset. Our code is available at https://github.com/DIVA-DIA/MT_augmentation_and_contrastive_learning/.
Related papers
- Understanding the Benefits of SimCLR Pre-Training in Two-Layer Convolutional Neural Networks [10.55004012983524]
SimCLR is one of the most popular contrastive learning methods for vision tasks.
We consider training a two-layer convolutional neural network (CNN) to learn a toy image data model.
We show that, under certain conditions on the number of labeled data, SimCLR pre-training combined with supervised fine-tuning achieves almost optimal test loss.
arXiv Detail & Related papers (2024-09-27T12:19:41Z) - Bridging the Sim-to-Real Gap with Bayesian Inference [53.61496586090384]
We present SIM-FSVGD for learning robot dynamics from data.
We use low-fidelity physical priors to regularize the training of neural network models.
We demonstrate the effectiveness of SIM-FSVGD in bridging the sim-to-real gap on a high-performance RC racecar system.
arXiv Detail & Related papers (2024-03-25T11:29:32Z) - Re-Simulation-based Self-Supervised Learning for Pre-Training Foundation
Models [1.230412738960606]
Self-Supervised Learning (SSL) is at the core of training modern large machine learning models.
We propose RS3L, a novel simulation-based SSL strategy that employs a method of re-simulation to drive data augmentation.
In addition to our results, we make the RS3L dataset publicly available for further studies on how to improve SSL strategies.
arXiv Detail & Related papers (2024-03-11T18:00:47Z) - Data Augmentation for Traffic Classification [54.92823760790628]
Data Augmentation (DA) is a technique widely adopted in Computer Vision (CV) and Natural Language Processing (NLP) tasks.
DA has struggled to gain traction in networking contexts, particularly in Traffic Classification (TC) tasks.
arXiv Detail & Related papers (2024-01-19T15:25:09Z) - The effect of data augmentation and 3D-CNN depth on Alzheimer's Disease
detection [51.697248252191265]
This work summarizes and strictly observes best practices regarding data handling, experimental design, and model evaluation.
We focus on Alzheimer's Disease (AD) detection, which serves as a paradigmatic example of challenging problem in healthcare.
Within this framework, we train predictive 15 models, considering three different data augmentation strategies and five distinct 3D CNN architectures.
arXiv Detail & Related papers (2023-09-13T10:40:41Z) - Learning Deep Representations via Contrastive Learning for Instance
Retrieval [11.736450745549792]
This paper makes the first attempt that tackles the problem using instance-discrimination based contrastive learning (CL)
In this work, we approach this problem by exploring the capability of deriving discriminative representations from pre-trained and fine-tuned CL models.
arXiv Detail & Related papers (2022-09-28T04:36:34Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - Texture Aware Autoencoder Pre-training And Pairwise Learning Refinement
For Improved Iris Recognition [16.383084641568693]
This paper presents an end-to-end trainable iris recognition system for datasets with limited training data.
We build upon our previous stagewise learning framework with certain key optimization and architectural innovations.
We validate our model across three publicly available iris datasets and the proposed model consistently outperforms both traditional and deep learning baselines.
arXiv Detail & Related papers (2022-02-15T15:12:31Z) - Consistency and Monotonicity Regularization for Neural Knowledge Tracing [50.92661409499299]
Knowledge Tracing (KT) tracking a human's knowledge acquisition is a central component in online learning and AI in Education.
We propose three types of novel data augmentation, coined replacement, insertion, and deletion, along with corresponding regularization losses.
Extensive experiments on various KT benchmarks show that our regularization scheme consistently improves the model performances.
arXiv Detail & Related papers (2021-05-03T02:36:29Z) - PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive
Learning [109.84770951839289]
We present PredRNN, a new recurrent network for learning visual dynamics from historical context.
We show that our approach obtains highly competitive results on three standard datasets.
arXiv Detail & Related papers (2021-03-17T08:28:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.