Related papers: Visual-textual Dermatoglyphic Animal Biometrics: A First Case Study on Panthera tigris

Visual-textual Dermatoglyphic Animal Biometrics: A First Case Study on Panthera tigris

URL: http://arxiv.org/abs/2512.14878v1
Date: Tue, 16 Dec 2025 19:47:02 GMT
Title: Visual-textual Dermatoglyphic Animal Biometrics: A First Case Study on Panthera tigris
Authors: Wenshuo Li, Majid Mirmehdi, Tilo Burghardt,
Abstract summary: We extend Re-ID methodologies by incorporating precise dermatoglyphic textual descriptors.<n>We show that these specialist semantics abstract and encode animal coat topology using human-interpretable language tags.<n>We conclude that dermatoglyphic language-guided biometrics can overcome vision-only limitations.
Score: 11.07566750390282
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Biologists have long combined visuals with textual field notes to re-identify (Re-ID) animals. Contemporary AI tools automate this for species with distinctive morphological features but remain largely image-based. Here, we extend Re-ID methodologies by incorporating precise dermatoglyphic textual descriptors-an approach used in forensics but new to ecology. We demonstrate that these specialist semantics abstract and encode animal coat topology using human-interpretable language tags. Drawing on 84,264 manually labelled minutiae across 3,355 images of 185 tigers (Panthera tigris), we evaluate this visual-textual methodology, revealing novel capabilities for cross-modal identity retrieval. To optimise performance, we developed a text-image co-synthesis pipeline to generate 'virtual individuals', each comprising dozens of life-like visuals paired with dermatoglyphic text. Benchmarking against real-world scenarios shows this augmentation significantly boosts AI accuracy in cross-modal retrieval while alleviating data scarcity. We conclude that dermatoglyphic language-guided biometrics can overcome vision-only limitations, enabling textual-to-visual identity recovery underpinned by human-verifiable matchings. This represents a significant advance towards explainability in Re-ID and a language-driven unification of descriptive modalities in ecological monitoring.

Related papers

Cytoarchitecture in Words: Weakly Supervised Vision-Language Modeling for Human Brain Microscopy [1.7429354559347476]
We propose a label-mediated method that generates meaningful captions from images by linking images and text only through a label.<n>Across 57 brain areas, the resulting method produces plausible area-level descriptions and supports open-set use through explicit rejection of unseen areas.
arXiv Detail & Related papers (2026-02-26T15:10:39Z)
BioCAP: Exploiting Synthetic Captions Beyond Labels in Biological Foundation Models [40.106880795877466]
Images and captions can be viewed as complementary samples from the latent morphospace of a species.<n>We generate synthetic captions with Wikipedia-derived visual information and taxon-tailored format examples.<n>These domain-specific contexts help reduce hallucination and yield accurate, instance-based captions.
arXiv Detail & Related papers (2025-10-23T00:34:21Z)
An Individual Identity-Driven Framework for Animal Re-Identification [15.381573249551181]
IndivAID is a framework specifically designed for Animal ReID. It generates image-specific and individual-specific textual descriptions that fully capture the diverse visual concepts of each individual across animal images. Evaluation against state-of-the-art methods across eight benchmark datasets and a real-world Stoat dataset demonstrates IndivAID's effectiveness and applicability.
arXiv Detail & Related papers (2024-10-30T11:34:55Z)
Vision-Language Synthetic Data Enhances Echocardiography Downstream Tasks [4.1942958779358674]
This paper utilizes recent vision-language models to produce diverse and realistic synthetic echocardiography image data. We show that the rich contextual information present in the synthesized data potentially enhances the accuracy and interpretability of downstream tasks.
arXiv Detail & Related papers (2024-03-28T23:26:45Z)
Hierarchical Text-to-Vision Self Supervised Alignment for Improved Histopathology Representation Learning [64.1316997189396]
We present a novel language-tied self-supervised learning framework, Hierarchical Language-tied Self-Supervision (HLSS) for histopathology images. Our resulting model achieves state-of-the-art performance on two medical imaging benchmarks, OpenSRH and TCGA datasets.
arXiv Detail & Related papers (2024-03-21T17:58:56Z)
Visual Analytics for Efficient Image Exploration and User-Guided Image Captioning [35.47078178526536]
Recent advancements in pre-trained large-scale language-image models have ushered in a new era of visual comprehension. This paper tackles two well-known issues within the realm of visual analytics: (1) the efficient exploration of large-scale image datasets and identification of potential data biases within them; (2) the evaluation of image captions and steering of their generation process.
arXiv Detail & Related papers (2023-11-02T06:21:35Z)
OCRBench: On the Hidden Mystery of OCR in Large Multimodal Models [122.27878464009181]
We conducted a comprehensive evaluation of Large Multimodal Models, such as GPT4V and Gemini, in various text-related visual tasks. OCRBench contains 29 datasets, making it the most comprehensive OCR evaluation benchmark available.
arXiv Detail & Related papers (2023-05-13T11:28:37Z)
Universal Multimodal Representation for Language Understanding [110.98786673598015]
This work presents new methods to employ visual information as assistant signals to general NLP tasks. For each sentence, we first retrieve a flexible number of images either from a light topic-image lookup table extracted over the existing sentence-image pairs. Then, the text and images are encoded by a Transformer encoder and convolutional neural network, respectively.
arXiv Detail & Related papers (2023-01-09T13:54:11Z)
Multi-Modal Masked Autoencoders for Medical Vision-and-Language Pre-Training [62.215025958347105]
We propose a self-supervised learning paradigm with multi-modal masked autoencoders. We learn cross-modal domain knowledge by reconstructing missing pixels and tokens from randomly masked images and texts.
arXiv Detail & Related papers (2022-09-15T07:26:43Z)
Vision-Language Pre-Training for Boosting Scene Text Detectors [57.08046351495244]
We specifically adapt vision-language joint learning for scene text detection. We propose to learn contextualized, joint representations through vision-language pre-training. The pre-trained model is able to produce more informative representations with richer semantics.
arXiv Detail & Related papers (2022-04-29T03:53:54Z)
From Show to Tell: A Survey on Image Captioning [48.98681267347662]
Connecting Vision and Language plays an essential role in Generative Intelligence. Research in image captioning has not reached a conclusive answer yet. This work aims at providing a comprehensive overview and categorization of image captioning approaches.
arXiv Detail & Related papers (2021-07-14T18:00:54Z)
TextCaps: a Dataset for Image Captioning with Reading Comprehension [56.89608505010651]
Text is omnipresent in human environments and frequently critical to understand our surroundings. To study how to comprehend text in the context of an image we collect a novel dataset, TextCaps, with 145k captions for 28k images. Our dataset challenges a model to recognize text, relate it to its visual context, and decide what part of the text to copy or paraphrase.
arXiv Detail & Related papers (2020-03-24T02:38:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.