Transfer Learning with Self-Supervised Vision Transformers for Snake Identification
- URL: http://arxiv.org/abs/2407.06178v1
- Date: Mon, 8 Jul 2024 17:52:23 GMT
- Title: Transfer Learning with Self-Supervised Vision Transformers for Snake Identification
- Authors: Anthony Miyaguchi, Murilo Gustineli, Austin Fischer, Ryan Lundqvist,
- Abstract summary: We present our approach for the SnakeCLEF 2024 competition to predict snake species from images.
We use Meta's DINOv2 vision transformer model for feature extraction to tackle species' high variability and visual similarity in a dataset of 182,261 images.
Despite achieving a score of 39.69, our results show promise for DINOv2 embeddings in snake identification.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present our approach for the SnakeCLEF 2024 competition to predict snake species from images. We explore and use Meta's DINOv2 vision transformer model for feature extraction to tackle species' high variability and visual similarity in a dataset of 182,261 images. We perform exploratory analysis on embeddings to understand their structure, and train a linear classifier on the embeddings to predict species. Despite achieving a score of 39.69, our results show promise for DINOv2 embeddings in snake identification. All code for this project is available at https://github.com/dsgt-kaggle-clef/snakeclef-2024.
Related papers
- Multi-Label Plant Species Classification with Self-Supervised Vision Transformers [0.0]
We present a transfer learning approach using a self-supervised Vision Transformer (DINOv2) for the PlantCLEF 2024 competition.
To address the computational challenges of the large-scale dataset, we employ Spark for distributed data processing.
Our results demonstrate the efficacy of combining transfer learning with advanced data processing techniques for multi-label image classification tasks.
arXiv Detail & Related papers (2024-07-08T18:07:33Z) - CNN Based Flank Predictor for Quadruped Animal Species [1.502956022927019]
We train a flank predictor that predicts the visible flank of quadruped mammalian species in images.
The developed models were evaluated in different scenarios of different unknown quadruped species in known and unknown environments.
The best model, trained on an EfficientNetV2 backbone, achieved an accuracy of 88.70 % for the unknown species lynx in a complex habitat.
arXiv Detail & Related papers (2024-06-19T14:24:26Z) - Poisson Variational Autoencoder [0.0]
Variational autoencoders (VAE) employ Bayesian inference to interpret sensory inputs.
Here, we develop a novel architecture that combines principles of predictive coding with a VAE that encodes inputs into discrete spike counts.
Our work provides an interpretable computational framework to study brain-like sensory processing.
arXiv Detail & Related papers (2024-05-23T12:02:54Z) - Watch out Venomous Snake Species: A Solution to SnakeCLEF2023 [27.7177597421459]
The SnakeCLEF2023 competition aims to the development of advanced algorithms for snake species identification.
This paper presents a method leveraging utilization of both images and metadata.
Our method achieves 91.31% score of the final metric combined of F1 and other metrics on private leaderboard.
arXiv Detail & Related papers (2023-07-19T04:59:58Z) - SVFormer: Semi-supervised Video Transformer for Action Recognition [88.52042032347173]
We introduce SVFormer, which adopts a steady pseudo-labeling framework to cope with unlabeled video samples.
In addition, we propose a temporal warping to cover the complex temporal variation in videos.
In particular, SVFormer outperforms the state-of-the-art by 31.5% with fewer training epochs under the 1% labeling rate of Kinetics-400.
arXiv Detail & Related papers (2022-11-23T18:58:42Z) - Solutions for Fine-grained and Long-tailed Snake Species Recognition in
SnakeCLEF 2022 [30.8004334312293]
We introduce our solution in SnakeCLEF 2022 for fine-grained snake species recognition on a heavy long-tailed class distribution.
With an ensemble of several different models, a private score 82.65%, ranking the 3rd, is achieved on the final leaderboard.
arXiv Detail & Related papers (2022-07-04T05:55:58Z) - ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for
Image Recognition and Beyond [76.35955924137986]
We propose a Vision Transformer Advanced by Exploring intrinsic IB from convolutions, i.e., ViTAE.
ViTAE has several spatial pyramid reduction modules to downsample and embed the input image into tokens with rich multi-scale context.
We obtain the state-of-the-art classification performance, i.e., 88.5% Top-1 classification accuracy on ImageNet validation set and the best 91.2% Top-1 accuracy on ImageNet real validation set.
arXiv Detail & Related papers (2022-02-21T10:40:05Z) - Learning Tracking Representations via Dual-Branch Fully Transformer
Networks [82.21771581817937]
We present a Siamese-like Dual-branch network based on solely Transformers for tracking.
We extract a feature vector for each patch based on its matching results with others within an attention window.
The method achieves better or comparable results as the best-performing methods.
arXiv Detail & Related papers (2021-12-05T13:44:33Z) - Token Labeling: Training a 85.4% Top-1 Accuracy Vision Transformer with
56M Parameters on ImageNet [86.95679590801494]
We explore the potential of vision transformers in ImageNet classification by developing a bag of training techniques.
We show that by slightly tuning the structure of vision transformers and introducing token labeling, our models are able to achieve better results than the CNN counterparts.
arXiv Detail & Related papers (2021-04-22T04:43:06Z) - ViViT: A Video Vision Transformer [75.74690759089529]
We present pure-transformer based models for video classification.
Our model extracts-temporal tokens from the input video, which are then encoded by a series of transformer layers.
We show how we can effectively regularise the model during training and leverage pretrained image models to be able to train on comparatively small datasets.
arXiv Detail & Related papers (2021-03-29T15:27:17Z) - Vision Transformers for Dense Prediction [77.34726150561087]
We introduce dense vision transformers, an architecture that leverages vision transformers in place of convolutional networks as a backbone for dense prediction tasks.
Our experiments show that this architecture yields substantial improvements on dense prediction tasks.
arXiv Detail & Related papers (2021-03-24T18:01:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.