AquaMonitor: A multimodal multi-view image sequence dataset for real-life aquatic invertebrate biodiversity monitoring
- URL: http://arxiv.org/abs/2505.22065v1
- Date: Wed, 28 May 2025 07:45:20 GMT
- Title: AquaMonitor: A multimodal multi-view image sequence dataset for real-life aquatic invertebrate biodiversity monitoring
- Authors: Mikko Impiƶ, Philipp M. Rehsen, Tiina Laamanen, Arne J. Beermann, Florian Leese, Jenni Raitoharju,
- Abstract summary: AquaMonitor is the first large computer vision dataset of aquatic invertebrates collected during routine environmental monitoring.<n>The dataset has 2.7M images from 43,189 specimens, DNA sequences for 1358 specimens, and dry mass and size measurements for 1494 specimens.
- Score: 4.412919904332943
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents the AquaMonitor dataset, the first large computer vision dataset of aquatic invertebrates collected during routine environmental monitoring. While several large species identification datasets exist, they are rarely collected using standardized collection protocols, and none focus on aquatic invertebrates, which are particularly laborious to collect. For AquaMonitor, we imaged all specimens from two years of monitoring whenever imaging was possible given practical limitations. The dataset enables the evaluation of automated identification methods for real-life monitoring purposes using a realistically challenging and unbiased setup. The dataset has 2.7M images from 43,189 specimens, DNA sequences for 1358 specimens, and dry mass and size measurements for 1494 specimens, making it also one of the largest biological multi-view and multimodal datasets to date. We define three benchmark tasks and provide strong baselines for these: 1) Monitoring benchmark, reflecting real-life deployment challenges such as open-set recognition, distribution shift, and extreme class imbalance, 2) Classification benchmark, which follows a standard fine-grained visual categorization setup, and 3) Few-shot benchmark, which targets classes with only few training examples from very fine-grained categories. Advancements on the Monitoring benchmark can directly translate to improvement of aquatic biodiversity monitoring, which is an important component of regular legislative water quality assessment in many countries.
Related papers
- Automated Detection of Antarctic Benthic Organisms in High-Resolution In Situ Imagery to Aid Biodiversity Monitoring [0.0]
We present a tailored object detection framework for Antarctic benthic organisms in high-resolution towed camera imagery.<n>We show strong performance in detecting medium and large organisms across 25 fine-grained morphotypes.<n>Our framework provides a scalable foundation for future machine-assisted in situ benthic biodiversity monitoring research.
arXiv Detail & Related papers (2025-07-29T10:22:29Z) - FishDet-M: A Unified Large-Scale Benchmark for Robust Fish Detection and CLIP-Guided Model Selection in Diverse Aquatic Visual Domains [1.3791394805787949]
FishDet-M is the largest unified benchmark for fish detection, comprising 13 publicly available datasets spanning diverse aquatic environments.<n>All data are harmonized using COCO-style annotations with both bounding boxes and segmentation masks.<n>FishDet-M establishes a standardized and reproducible platform for evaluating object detection in complex aquatic scenes.
arXiv Detail & Related papers (2025-07-23T18:32:01Z) - BioCLIP 2: Emergent Properties from Scaling Hierarchical Contrastive Learning [51.341003735575335]
We find emergent behaviors in biological vision models via large-scale contrastive vision-language training.<n>We train BioCLIP 2 on TreeOfLife-200M to distinguish different species.<n>We identify emergent properties in the learned embedding space of BioCLIP 2.
arXiv Detail & Related papers (2025-05-29T17:48:20Z) - Image-Based Relocalization and Alignment for Long-Term Monitoring of Dynamic Underwater Environments [57.59857784298534]
We propose an integrated pipeline that combines Visual Place Recognition (VPR), feature matching, and image segmentation on video-derived images.<n>This method enables robust identification of revisited areas, estimation of rigid transformations, and downstream analysis of ecosystem changes.
arXiv Detail & Related papers (2025-03-06T05:13:19Z) - MPT: A Large-scale Multi-Phytoplankton Tracking Benchmark [36.37530623015916]
We propose a benchmark dataset, Multiple Phytoplankton Tracking (MPT), which covers diverse background information and variations in motion during observation.<n>The dataset includes 27 species of phytoplankton and zooplankton, 14 different backgrounds to simulate diverse and complex underwater environments, and a total of 140 videos.<n>We introduce an additional feature extractor to predict the residuals of the standard feature extractor's output, and compute multi-scale frame-to-frame similarity based on features from different layers of the extractor.
arXiv Detail & Related papers (2024-10-22T04:57:28Z) - Underwater Camouflaged Object Tracking Meets Vision-Language SAM2 [60.47622353256502]
We propose the first large-scale multi-modal underwater camouflaged object tracking dataset, namely UW-COT220.<n>Based on the proposed dataset, this work first evaluates current advanced visual object tracking methods, including SAM- and SAM2-based trackers, in challenging underwater environments.<n>Our findings highlight the improvements of SAM2 over SAM, demonstrating its enhanced ability to handle the complexities of underwater camouflaged objects.
arXiv Detail & Related papers (2024-09-25T13:10:03Z) - Wild Face Anti-Spoofing Challenge 2023: Benchmark and Results [73.98594459933008]
Face anti-spoofing (FAS) is an essential mechanism for safeguarding the integrity of automated face recognition systems.
This limitation can be attributed to the scarcity and lack of diversity in publicly available FAS datasets.
We introduce the Wild Face Anti-Spoofing dataset, a large-scale, diverse FAS dataset collected in unconstrained settings.
arXiv Detail & Related papers (2023-04-12T10:29:42Z) - Vision meets algae: A novel way for microalgae recognization and health monitor [6.731844884087066]
This dataset includes images of different genus of algae and the same genus in different states.
We trained, validated and tested the TOOD, YOLOv5, YOLOv8 and variants of RCNN algorithms on this dataset.
The results showed both one-stage and two-stage object detection models can achieve high mean average precision.
arXiv Detail & Related papers (2022-11-14T17:11:15Z) - Active Gaze Control for Foveal Scene Exploration [124.11737060344052]
We propose a methodology to emulate how humans and robots with foveal cameras would explore a scene.
The proposed method achieves an increase in detection F1-score of 2-3 percentage points for the same number of gaze shifts.
arXiv Detail & Related papers (2022-08-24T14:59:28Z) - Ensembles of Vision Transformers as a New Paradigm for Automated
Classification in Ecology [0.0]
We show that ensembles of Data-efficient image Transformers (DeiTs) significantly outperform the previous state of the art (SOTA)
On all the data sets we test, we achieve a new SOTA, with a reduction of the error with respect to the previous SOTA ranging from 18.48% to 87.50%.
arXiv Detail & Related papers (2022-03-03T14:16:22Z) - A Realistic Fish-Habitat Dataset to Evaluate Algorithms for Underwater
Visual Analysis [2.6476746128312194]
We present DeepFish as a benchmark suite with a large-scale dataset to train and test methods for several computer vision tasks.
The dataset consists of approximately 40 thousand images collected underwater from 20 greenhabitats in the marine-environments of tropical Australia.
Our experiments provide an in-depth analysis of the dataset characteristics, and the performance evaluation of several state-of-the-art approaches.
arXiv Detail & Related papers (2020-08-28T12:20:59Z) - Automatic image-based identification and biomass estimation of
invertebrates [70.08255822611812]
Time-consuming sorting and identification of taxa pose strong limitations on how many insect samples can be processed.
We propose to replace the standard manual approach of human expert-based sorting and identification with an automatic image-based technology.
We use state-of-the-art Resnet-50 and InceptionV3 CNNs for the classification task.
arXiv Detail & Related papers (2020-02-05T21:38:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.