Fine-Grained Zero-Shot Learning with DNA as Side Information
- URL: http://arxiv.org/abs/2109.14133v1
- Date: Wed, 29 Sep 2021 01:45:22 GMT
- Title: Fine-Grained Zero-Shot Learning with DNA as Side Information
- Authors: Sarkhan Badirli, Zeynep Akata, George Mohler, Christine Picard, Murat
Dundar
- Abstract summary: We use DNA as side information for fine-grained zero-shot classification of species.
We implement a simple hierarchical Bayesian model that uses DNA information to establish the hierarchy in the image space.
We show that DNA can be equally promising yet in general a more accessible alternative than word vectors.
- Score: 31.82132159867097
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Fine-grained zero-shot learning task requires some form of side-information
to transfer discriminative information from seen to unseen classes. As manually
annotated visual attributes are extremely costly and often impractical to
obtain for a large number of classes, in this study we use DNA as side
information for the first time for fine-grained zero-shot classification of
species. Mitochondrial DNA plays an important role as a genetic marker in
evolutionary biology and has been used to achieve near-perfect accuracy in the
species classification of living organisms. We implement a simple hierarchical
Bayesian model that uses DNA information to establish the hierarchy in the
image space and employs local priors to define surrogate classes for unseen
ones. On the benchmark CUB dataset, we show that DNA can be equally promising
yet in general a more accessible alternative than word vectors as a side
information. This is especially important as obtaining robust word
representations for fine-grained species names is not a practicable goal when
information about these species in free-form text is limited. On a newly
compiled fine-grained insect dataset that uses DNA information from over a
thousand species, we show that the Bayesian approach outperforms
state-of-the-art by a wide margin.
Related papers
- VQDNA: Unleashing the Power of Vector Quantization for Multi-Species Genomic Sequence Modeling [60.91599380893732]
VQDNA is a general-purpose framework that renovates genome tokenization from the perspective of genome vocabulary learning.
By leveraging vector-quantized codebooks as learnable vocabulary, VQDNA can adaptively tokenize genomes into pattern-aware embeddings.
arXiv Detail & Related papers (2024-05-13T20:15:03Z) - DNABERT-S: Pioneering Species Differentiation with Species-Aware DNA Embeddings [7.822348354050447]
We introduce DNABERT-S, a tailored genome model that develops species-aware embeddings to naturally cluster and segregate DNA sequences of different species.
Emerged results on 23 diverse datasets show DNABERT-S's effectiveness, especially in realistic label-scarce scenarios.
arXiv Detail & Related papers (2024-02-13T20:21:29Z) - TEPI: Taxonomy-aware Embedding and Pseudo-Imaging for Scarcely-labeled
Zero-shot Genome Classification [0.0]
A species' genetic code or genome encodes valuable evolutionary, biological, and phylogenetic information.
Traditional bioinformatics tools have made notable progress but lack scalability and are computationally expensive.
We propose addressing this problem through zero-shot learning using TEPI, taxonomy-aware Embedding and Pseudo-Imaging.
arXiv Detail & Related papers (2024-01-24T04:16:28Z) - BEND: Benchmarking DNA Language Models on biologically meaningful tasks [7.005668635562045]
We introduce BEND, a Benchmark for DNA language models, featuring a collection of realistic and biologically meaningful downstream tasks.
We find that embeddings from current DNA LMs can approach performance of expert methods on some tasks, but only capture limited information about long-range features.
arXiv Detail & Related papers (2023-11-21T12:34:00Z) - HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide
Resolution [76.97231739317259]
We present HyenaDNA, a genomic foundation model pretrained on the human reference genome with context lengths of up to 1 million tokens at the single nucleotide-level.
On fine-tuned benchmarks from the Nucleotide Transformer, HyenaDNA reaches state-of-the-art (SotA) on 12 of 18 datasets using a model with orders of magnitude less parameters and pretraining data.
arXiv Detail & Related papers (2023-06-27T20:46:34Z) - Deep Visual-Genetic Biometrics for Taxonomic Classification of Rare
Species [1.9819034119774483]
We propose aligned visual-genetic inference spaces with the aim to implicitly encode cross-domain associations for improved performance.
We experimentally demonstrate the efficacy of the concept via application to microscopic imagery of 30k+ planktic foraminifer shells.
Visual-genetic alignment can significantly benefit visual-only recognition of the rarest species.
arXiv Detail & Related papers (2023-05-11T10:04:27Z) - Semi-Supervised Domain Adaptation for Cross-Survey Galaxy Morphology
Classification and Anomaly Detection [57.85347204640585]
We develop a Universal Domain Adaptation method DeepAstroUDA.
It can be applied to datasets with different types of class overlap.
For the first time, we demonstrate the successful use of domain adaptation on two very different observational datasets.
arXiv Detail & Related papers (2022-11-01T18:07:21Z) - Few-Shot Meta Learning for Recognizing Facial Phenotypes of Genetic
Disorders [55.41644538483948]
Automated classification and similarity retrieval aid physicians in decision-making to diagnose possible genetic conditions as early as possible.
Previous work has addressed the problem as a classification problem and used deep learning methods.
In this study, we used a facial recognition model trained on a large corpus of healthy individuals as a pre-task and transferred it to facial phenotype recognition.
arXiv Detail & Related papers (2022-10-23T11:52:57Z) - Taxonomy and evolution predicting using deep learning in images [9.98733710208427]
This study creates a novel recognition framework by systematically studying the mushroom image recognition problem.
We present the first method to map images to DNA, namely used an encoder mapping image to genetic distances, and then decoded DNA through a pre-trained decoder.
arXiv Detail & Related papers (2022-06-28T13:54:14Z) - Transferring Dense Pose to Proximal Animal Classes [83.84439508978126]
We show that it is possible to transfer the knowledge existing in dense pose recognition for humans, as well as in more general object detectors and segmenters, to the problem of dense pose recognition in other classes.
We do this by establishing a DensePose model for the new animal which is also geometrically aligned to humans.
We also introduce two benchmark datasets labelled in the manner of DensePose for the class chimpanzee and use them to evaluate our approach.
arXiv Detail & Related papers (2020-02-28T21:43:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.