MATANet: A Multi-context Attention and Taxonomy-Aware Network for Fine-Grained Underwater Recognition of Marine Species
- URL: http://arxiv.org/abs/2601.03729v1
- Date: Wed, 07 Jan 2026 09:21:45 GMT
- Title: MATANet: A Multi-context Attention and Taxonomy-Aware Network for Fine-Grained Underwater Recognition of Marine Species
- Authors: Donghwan Lee, Byeongjin Kim, Geunhee Kim, Hyukjin Kwon, Nahyeon Maeng, Wooju Kim,
- Abstract summary: MATANet is a novel model designed for fine-grained marine species classification.<n>It mimics expert strategies by using taxonomy and environmental context to interpret ambiguous features of underwater animals.<n>Experiments on the FathomNet2025, FAIR1M, and LifeCLEF2015-Fish datasets demonstrate state-of-the-art performance.
- Score: 6.870403086472032
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fine-grained classification of marine animals supports ecology, biodiversity and habitat conservation, and evidence-based policy-making. However, existing methods often overlook contextual interactions from the surrounding environment and insufficiently incorporate the hierarchical structure of marine biological taxonomy. To address these challenges, we propose MATANet (Multi-context Attention and Taxonomy-Aware Network), a novel model designed for fine-grained marine species classification. MATANet mimics expert strategies by using taxonomy and environmental context to interpret ambiguous features of underwater animals. It consists of two key components: a Multi-Context Environmental Attention Module (MCEAM), which learns relationships between regions of interest (ROIs) and their surrounding environments, and a Hierarchical Separation-Induced Learning Module (HSLM), which encodes taxonomic hierarchy into the feature space. MATANet combines instance and environmental features with taxonomic structure to enhance fine-grained classification. Experiments on the FathomNet2025, FAIR1M, and LifeCLEF2015-Fish datasets demonstrate state-of-the-art performance. The source code is available at: https://github.com/dhlee-work/fathomnet-cvpr2025-ssl
Related papers
- An Ecologically-Informed Deep Learning Framework for Interpretable and Validatable Habitat Mapping [1.4672361353012924]
ECOSAIC is an Artificial Intelligence framework for automatic classification of benthic habitats.<n> ECOSAIC compresses n-dimensional feature space by optimizing specialization and orthogonality between domain-informed features.<n>We applied the model to the Colombian Pacific Ocean and the results revealed 16 benthic habitats, expanding from mangroves to deep rocky areas up to 1000 m depth.
arXiv Detail & Related papers (2025-11-18T23:38:29Z) - UWBench: A Comprehensive Vision-Language Benchmark for Underwater Understanding [54.16709436340606]
Large vision-language models (VLMs) have achieved remarkable success in natural scene understanding.<n>Underwater imagery presents unique challenges including severe light attenuation, color distortion, and suspended particle scattering.<n>We introduce UWBench, a benchmark specifically designed for underwater vision-language understanding.
arXiv Detail & Related papers (2025-10-21T03:32:15Z) - MARIS: Marine Open-Vocabulary Instance Segmentation with Geometric Enhancement and Semantic Alignment [56.88334234553316]
We introduce textbfMARIS (underlineMarine Open-Vocabulary underlineInstance underlineSegmentation), the first large-scale fine-grained benchmark for underwater Open-Vocabulary (OV) segmentation.<n>Our framework consistently outperforms existing OV baselines both In-Domain and Cross-Domain setting.
arXiv Detail & Related papers (2025-10-17T07:50:58Z) - Hyperbolic Multimodal Representation Learning for Biological Taxonomies [23.639218053531962]
Taxonomic classification in biodiversity research involves organizing biological specimens into structured hierarchies based on evidence.<n>We investigate whether hyperbolic networks can provide a better embedding space for such hierarchical models.<n>Our method embeds multimodal inputs into a shared hyperbolic space using contrastive and a novel stacked entailment-based objective.
arXiv Detail & Related papers (2025-08-22T18:52:50Z) - BioCLIP 2: Emergent Properties from Scaling Hierarchical Contrastive Learning [60.80381372245902]
We find emergent behaviors in biological vision models via large-scale contrastive vision-language training.<n>We train BioCLIP 2 on TreeOfLife-200M to distinguish different species.<n>We identify emergent properties in the learned embedding space of BioCLIP 2.
arXiv Detail & Related papers (2025-05-29T17:48:20Z) - SuoiAI: Building a Dataset for Aquatic Invertebrates in Vietnam [4.338234621260792]
This paper proposes SuoiAI, an end-to-end pipeline for building a dataset of aquatic invertebrates in Vietnam.<n>We outline the methods for data collection, annotation, and model training, focusing on reducing annotation effort through semi-supervised learning.<n>Our approach aims to overcome challenges such as data scarcity, fine-grained classification, and deployment in diverse environmental conditions.
arXiv Detail & Related papers (2025-04-21T17:33:02Z) - Image-Based Relocalization and Alignment for Long-Term Monitoring of Dynamic Underwater Environments [57.59857784298534]
We propose an integrated pipeline that combines Visual Place Recognition (VPR), feature matching, and image segmentation on video-derived images.<n>This method enables robust identification of revisited areas, estimation of rigid transformations, and downstream analysis of ecosystem changes.
arXiv Detail & Related papers (2025-03-06T05:13:19Z) - Hierarchical Classification for Automated Image Annotation of Coral Reef Benthic Structures [5.407146435972322]
Automated benthic image annotation is crucial to efficiently monitor and protect coral reefs against climate change.<n>Current machine learning approaches fail to capture the hierarchical nature of benthic organisms.<n>We propose to annotate benthic images using hierarchical classification.
arXiv Detail & Related papers (2024-12-11T09:28:30Z) - Combining Observational Data and Language for Species Range Estimation [63.65684199946094]
We propose a novel approach combining millions of citizen science species observations with textual descriptions from Wikipedia.<n>Our framework maps locations, species, and text descriptions into a common space, enabling zero-shot range estimation from textual descriptions.<n>Our approach also acts as a strong prior when combined with observational data, resulting in more accurate range estimation with less data.
arXiv Detail & Related papers (2024-10-14T17:22:55Z) - WhaleNet: a Novel Deep Learning Architecture for Marine Mammals Vocalizations on Watkins Marine Mammal Sound Database [49.1574468325115]
We introduce textbfWhaleNet (Wavelet Highly Adaptive Learning Ensemble Network), a sophisticated deep ensemble architecture for the classification of marine mammal vocalizations.
We achieve an improvement in classification accuracy by $8-10%$ over existing architectures, corresponding to a classification accuracy of $97.61%$.
arXiv Detail & Related papers (2024-02-20T11:36:23Z) - BenthIQ: a Transformer-Based Benthic Classification Model for Coral
Restoration [4.931399476945033]
Coral reefs are vital for marine biodiversity, coastal protection, and supporting human livelihoods globally.
Current methods for creating benthic composition maps often compromise between spatial coverage and resolution.
We introduce BenthIQ, a multi-label semantic segmentation network designed for high-precision classification of underwater substrates.
arXiv Detail & Related papers (2023-11-22T19:25:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.