BeetleVerse: A study on taxonomic classification of ground beetles
- URL: http://arxiv.org/abs/2504.13393v1
- Date: Fri, 18 Apr 2025 01:06:37 GMT
- Title: BeetleVerse: A study on taxonomic classification of ground beetles
- Authors: S M Rayeed, Alyson East, Samuel Stevens, Sydne Record, Charles V Stewart,
- Abstract summary: Ground beetles are a highly sensitive and speciose biological indicator, making them vital for monitoring biodiversity.<n>In this paper, we evaluate 12 vision models on taxonomic classification across four diverse, long-tailed datasets.
- Score: 0.310688583550805
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Ground beetles are a highly sensitive and speciose biological indicator, making them vital for monitoring biodiversity. However, they are currently an underutilized resource due to the manual effort required by taxonomic experts to perform challenging species differentiations based on subtle morphological differences, precluding widespread applications. In this paper, we evaluate 12 vision models on taxonomic classification across four diverse, long-tailed datasets spanning over 230 genera and 1769 species, with images ranging from controlled laboratory settings to challenging field-collected (in-situ) photographs. We further explore taxonomic classification in two important real-world contexts: sample efficiency and domain adaptation. Our results show that the Vision and Language Transformer combined with an MLP head is the best performing model, with 97\% accuracy at genus and 94\% at species level. Sample efficiency analysis shows that we can reduce train data requirements by up to 50\% with minimal compromise in performance. The domain adaptation experiments reveal significant challenges when transferring models from lab to in-situ images, highlighting a critical domain gap. Overall, our study lays a foundation for large-scale automated taxonomic classification of beetles, and beyond that, advances sample-efficient learning and cross-domain adaptation for diverse long-tailed ecological datasets.
Related papers
- DivShift: Exploring Domain-Specific Distribution Shifts in Large-Scale, Volunteer-Collected Biodiversity Datasets [0.0]
Large-scale, volunteer-collected datasets of community-identified natural world imagery like iNaturalist have enabled marked performance gains for fine-grained visual classification of species using machine learning methods.<n>Here we introduce Diversity Shift, a framework for quantifying the effects of domain-specific distribution shifts on machine learning model performance.<n>To diagnose the performance effects of biases specific to volunteer-collected biodiversity data, we also introduce DivShift - North American West Coast (DivShift-NAWC), a curated dataset of almost 7.5 million iNaturalist images across the western coast of North America partitioned across five types of expert-verified bias.
arXiv Detail & Related papers (2024-10-17T23:56:30Z) - Comparison between transformers and convolutional models for
fine-grained classification of insects [7.107353918348911]
We consider the taxonomical class of Insecta.
The identification of insects is essential in biodiversity monitoring as they are one of the inhabitants at the base of many ecosystems.
We have billions of images that need to be automatically classified and deep neural network algorithms are one of the main techniques explored for fine-grained tasks.
arXiv Detail & Related papers (2023-07-20T10:00:04Z) - A Step Towards Worldwide Biodiversity Assessment: The BIOSCAN-1M Insect
Dataset [18.211840156134784]
This paper presents a curated million-image dataset, primarily to train computer-vision models capable of providing image-based taxonomic assessment.
The dataset also presents compelling characteristics, the study of which would be of interest to the broader machine learning community.
arXiv Detail & Related papers (2023-07-19T20:54:08Z) - Spatial Implicit Neural Representations for Global-Scale Species Mapping [72.92028508757281]
Given a set of locations where a species has been observed, the goal is to build a model to predict whether the species is present or absent at any location.
Traditional methods struggle to take advantage of emerging large-scale crowdsourced datasets.
We use Spatial Implicit Neural Representations (SINRs) to jointly estimate the geographical range of 47k species simultaneously.
arXiv Detail & Related papers (2023-06-05T03:36:01Z) - Ensembles of Vision Transformers as a New Paradigm for Automated
Classification in Ecology [0.0]
We show that ensembles of Data-efficient image Transformers (DeiTs) significantly outperform the previous state of the art (SOTA)
On all the data sets we test, we achieve a new SOTA, with a reduction of the error with respect to the previous SOTA ranging from 18.48% to 87.50%.
arXiv Detail & Related papers (2022-03-03T14:16:22Z) - Relational Subsets Knowledge Distillation for Long-tailed Retinal
Diseases Recognition [65.77962788209103]
We propose class subset learning by dividing the long-tailed data into multiple class subsets according to prior knowledge.
It enforces the model to focus on learning the subset-specific knowledge.
The proposed framework proved to be effective for the long-tailed retinal diseases recognition task.
arXiv Detail & Related papers (2021-04-22T13:39:33Z) - Dynamic $\beta$-VAEs for quantifying biodiversity by clustering
optically recorded insect signals [0.6091702876917281]
We propose an adaptive variant of the variational autoencoder (VAE) capable of clustering data by phylogenetic groups.
We demonstrate the usefulness of the dynamic $beta$-VAE on optically recorded insect signals from regions of southern Scandinavia.
arXiv Detail & Related papers (2021-02-10T16:14:13Z) - Deep Low-Shot Learning for Biological Image Classification and
Visualization from Limited Training Samples [52.549928980694695]
In situ hybridization (ISH) gene expression pattern images from the same developmental stage are compared.
labeling training data with precise stages is very time-consuming even for biologists.
We propose a deep two-step low-shot learning framework to accurately classify ISH images using limited training images.
arXiv Detail & Related papers (2020-10-20T06:06:06Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - Two-View Fine-grained Classification of Plant Species [66.75915278733197]
We propose a novel method based on a two-view leaf image representation and a hierarchical classification strategy for fine-grained recognition of plant species.
A deep metric based on Siamese convolutional neural networks is used to reduce the dependence on a large number of training samples and make the method scalable to new plant species.
arXiv Detail & Related papers (2020-05-18T21:57:47Z) - Automatic image-based identification and biomass estimation of
invertebrates [70.08255822611812]
Time-consuming sorting and identification of taxa pose strong limitations on how many insect samples can be processed.
We propose to replace the standard manual approach of human expert-based sorting and identification with an automatic image-based technology.
We use state-of-the-art Resnet-50 and InceptionV3 CNNs for the classification task.
arXiv Detail & Related papers (2020-02-05T21:38:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.