Self-Supervised Learning Based Handwriting Verification
- URL: http://arxiv.org/abs/2405.18320v2
- Date: Thu, 1 Aug 2024 17:43:19 GMT
- Title: Self-Supervised Learning Based Handwriting Verification
- Authors: Mihir Chauhan, Mohammad Abuzar Hashemi, Abhishek Satbhai, Mir Basheer Ali, Bina Ramamurthy, Mingchen Gao, Siwei Lyu, Sargur Srihari,
- Abstract summary: We show that ResNet based Variational Auto-Encoder (VAE) outperforms other generative approaches achieving 76.3% accuracy.
Using a pre-trained VAE and VICReg for the downstream task of writer verification we observed a relative improvement in accuracy of 6.7% and 9% over ResNet-18 supervised baseline with 10% writer labels.
- Score: 23.983430206133793
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present SSL-HV: Self-Supervised Learning approaches applied to the task of Handwriting Verification. This task involves determining whether a given pair of handwritten images originate from the same or different writer distribution. We have compared the performance of multiple generative, contrastive SSL approaches against handcrafted feature extractors and supervised learning on CEDAR AND dataset. We show that ResNet based Variational Auto-Encoder (VAE) outperforms other generative approaches achieving 76.3% accuracy, while ResNet-18 fine-tuned using Variance-Invariance-Covariance Regularization (VICReg) outperforms other contrastive approaches achieving 78% accuracy. Using a pre-trained VAE and VICReg for the downstream task of writer verification we observed a relative improvement in accuracy of 6.7% and 9% over ResNet-18 supervised baseline with 10% writer labels.
Related papers
- Vision-Language Model Based Handwriting Verification [23.983430206133793]
This paper explores using Vision Language Models (VLMs), such as OpenAI's GPT-4o and Google's PaliGemma, to address these challenges.
Our goal is to provide clear, human-understandable explanations for model decisions.
arXiv Detail & Related papers (2024-07-31T17:57:32Z) - Benchmarking and Improving Generator-Validator Consistency of Language
Models [82.73914625520686]
inconsistency between generating and validating an answer is prevalent in language models (LMs)
Even GPT-4, a state-of-the-art LM, is GV-consistent only 76% of the time.
We find that this approach improves GV-consistency of Alpaca-30B from 60% to 93%.
arXiv Detail & Related papers (2023-10-03T07:23:22Z) - ESimCSE Unsupervised Contrastive Learning Jointly with UDA
Semi-Supervised Learning for Large Label System Text Classification Mode [4.708633772366381]
The ESimCSE model efficiently learns text vector representations using unlabeled data to achieve better classification results.
UDA is trained using unlabeled data through semi-supervised learning methods to improve the prediction performance of the models and stability.
adversarial training techniques FGM and PGD are used in the model training process to improve the robustness and reliability of the model.
arXiv Detail & Related papers (2023-04-19T03:44:23Z) - Transferring Pre-trained Multimodal Representations with Cross-modal
Similarity Matching [49.730741713652435]
In this paper, we propose a method that can effectively transfer the representations of a large pre-trained multimodal model into a small target model.
For unsupervised transfer, we introduce cross-modal similarity matching (CSM) that enables a student model to learn the representations of a teacher model.
To better encode the text prompts, we design context-based prompt augmentation (CPA) that can alleviate the lexical ambiguity of input text prompts.
arXiv Detail & Related papers (2023-01-07T17:24:11Z) - Co-supervised learning paradigm with conditional generative adversarial
networks for sample-efficient classification [8.27719348049333]
This paper introduces a sample-efficient co-supervised learning paradigm (SEC-CGAN)
SEC-CGAN is trained alongside the classifier and supplements semantics-conditioned, confidence-aware synthesized examples to the annotated data during the training process.
Experiments demonstrate that the proposed SEC-CGAN outperforms the external classifier GAN and a baseline ResNet-18 classifier.
arXiv Detail & Related papers (2022-12-27T19:24:31Z) - Understanding and Improving Visual Prompting: A Label-Mapping
Perspective [63.89295305670113]
We revisit and advance visual prompting (VP), an input prompting technique for vision tasks.
We propose a new VP framework, termed ILM-VP, which automatically re-maps the source labels to the target labels.
Our proposal significantly outperforms state-of-the-art VP methods.
arXiv Detail & Related papers (2022-11-21T16:49:47Z) - Distilling Facial Knowledge With Teacher-Tasks:
Semantic-Segmentation-Features For Pose-Invariant Face-Recognition [1.1811442086145123]
The proposed Seg-Distilled-ID network jointly learns identification and semantic-segmentation tasks, where the segmentation task is then "distilled"
Performance is benchmarked against three state-of-the-art encoders on a publicly available data-set.
Experimental evaluations show the Seg-Distilled-ID network shows notable benefits, achieving 99.9% test-accuracy in comparison to 81.6% on ResNet-101, 96.1% on VGG-19 and 96.3% on InceptionV3.
arXiv Detail & Related papers (2022-09-02T15:24:22Z) - Measuring Self-Supervised Representation Quality for Downstream
Classification using Discriminative Features [56.89813105411331]
We study the representation space of state-of-the-art self-supervised models including SimCLR, SwaV, MoCo, BYOL, DINO, SimSiam, VICReg and Barlow Twins.
We propose Self-Supervised Representation Quality Score (or Q-Score), an unsupervised score that can reliably predict if a given sample is likely to be mis-classified.
Fine-tuning with Q-Score regularization can boost the linear probing accuracy of SSL models by up to 5.8% on ImageNet-100 and 3.7% on ImageNet-1K.
arXiv Detail & Related papers (2022-03-03T17:48:23Z) - To be Critical: Self-Calibrated Weakly Supervised Learning for Salient
Object Detection [95.21700830273221]
Weakly-supervised salient object detection (WSOD) aims to develop saliency models using image-level annotations.
We propose a self-calibrated training strategy by explicitly establishing a mutual calibration loop between pseudo labels and network predictions.
We prove that even a much smaller dataset with well-matched annotations can facilitate models to achieve better performance as well as generalizability.
arXiv Detail & Related papers (2021-09-04T02:45:22Z) - With a Little Help from My Friends: Nearest-Neighbor Contrastive
Learning of Visual Representations [87.72779294717267]
Using the nearest-neighbor as positive in contrastive losses improves performance significantly on ImageNet classification.
We demonstrate empirically that our method is less reliant on complex data augmentations.
arXiv Detail & Related papers (2021-04-29T17:56:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.