FishAI 2.0: Marine Fish Image Classification with Multi-modal Few-shot Learning
- URL: http://arxiv.org/abs/2509.22930v1
- Date: Fri, 26 Sep 2025 20:54:35 GMT
- Title: FishAI 2.0: Marine Fish Image Classification with Multi-modal Few-shot Learning
- Authors: Chenghan Yang, Peng Zhou, Dong-Sheng Zhang, Yueyun Wang, Hong-Bin Shen, Xiaoyong Pan,
- Abstract summary: FishAI 2.0 integrates multimodal few-shot deep learning techniques with image generation for data augmentation.<n>FishAI 2.0 achieves a Top-1 accuracy of 91.67 percent and Top-5 accuracy of 97.97 percent at the family level.
- Score: 4.649981516403062
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Traditional marine biological image recognition faces challenges of incomplete datasets and unsatisfactory model accuracy, particularly for few-shot conditions of rare species where data scarcity significantly hampers the performance. To address these issues, this study proposes an intelligent marine fish recognition framework, FishAI 2.0, integrating multimodal few-shot deep learning techniques with image generation for data augmentation. First, a hierarchical marine fish benchmark dataset, which provides a comprehensive data foundation for subsequent model training, is utilized to train the FishAI 2.0 model. To address the data scarcity of rare classes, the large language model DeepSeek was employed to generate high-quality textual descriptions, which are input into Stable Diffusion 2 for image augmentation through a hierarchical diffusion strategy that extracts latent encoding to construct a multimodal feature space. The enhanced visual-textual datasets were then fed into a Contrastive Language-Image Pre-Training (CLIP) based model, enabling robust few-shot image recognition. Experimental results demonstrate that FishAI 2.0 achieves a Top-1 accuracy of 91.67 percent and Top-5 accuracy of 97.97 percent at the family level, outperforming baseline CLIP and ViT models with a substantial margin for the minority classes with fewer than 10 training samples. To better apply FishAI 2.0 to real-world scenarios, at the genus and species level, FishAI 2.0 respectively achieves a Top-1 accuracy of 87.58 percent and 85.42 percent, demonstrating practical utility. In summary, FishAI 2.0 improves the efficiency and accuracy of marine fish identification and provides a scalable technical solution for marine ecological monitoring and conservation, highlighting its scientific value and practical applicability.
Related papers
- Evaluation of deep learning architectures for wildlife object detection: A comparative study of ResNet and Inception [0.0]
This study investigates the effectiveness of two individual deep learning architectures ResNet-101 and Inception v3 for wildlife object detection.<n>The models were trained and evaluated on a wildlife image dataset using a standardized preprocessing approach.<n>The ResNet-101 model achieved a classification accuracy of 94% and a mean Average Precision (mAP) of 0.91, showing strong performance in extracting deep hierarchical features.
arXiv Detail & Related papers (2025-12-17T14:30:47Z) - FishDetector-R1: Unified MLLM-Based Framework with Reinforcement Fine-Tuning for Weakly Supervised Fish Detection, Segmentation, and Counting [12.040327353059945]
We introduce FishDetector-R1, a unified MLLM-based framework for fish detection, segmentation, and counting under weak supervision.<n>On the DeepFish dataset, our framework achieves substantial gains over baselines, improving AP by 20% and mIoU by 10%, while reducing MAE by 30% and GAME by 35%.
arXiv Detail & Related papers (2025-12-01T06:23:56Z) - A Generative Data Framework with Authentic Supervision for Underwater Image Restoration and Enhancement [51.382274157144714]
We develop a generative data framework based on unpaired image-to-image translation.<n>The framework constructs synthetic datasets with precise ground-truth labels.<n>Experiments show that models trained on our synthetic data achieve comparable or superior color restoration and generalization performance to those trained on existing benchmarks.
arXiv Detail & Related papers (2025-11-18T14:20:17Z) - Real-Time Fish Detection in Indonesian Marine Ecosystems Using Lightweight YOLOv10-nano Architecture [0.0]
This study explores the implementation of YOLOv10-nano, a state-of-the-art deep learning model, for real-time marine fish detection in Indonesian waters.<n>YOLOv10's architecture, featuring improvements like the CSPNet backbone, PAN for feature fusion, and Pyramid Spatial Attention Block, enables efficient and accurate object detection.<n>Results show that YOLOv10-nano achieves a high detection accuracy with mAP50 of 0.966 and mAP50:95 of 0.606 while maintaining low computational demand.
arXiv Detail & Related papers (2025-09-22T07:02:48Z) - IMASHRIMP: Automatic White Shrimp (Penaeus vannamei) Biometrical Analysis from Laboratory Images Using Computer Vision and Deep Learning [0.0]
IMASHRIMP is an adapted system for the automated morphological analysis of white shrimp (Penaeus vannamei)<n>Existing deep learning and computer vision techniques were modified to address the specific challenges of shrimp morphology analysis from RGBD images.<n>IMASHRIMP incorporates two discrimination modules, based on a modified ResNet-50 architecture, to classify images by the point of view and determine rostrum integrity.
arXiv Detail & Related papers (2025-07-03T10:32:49Z) - Efficient Self-Supervised Learning for Earth Observation via Dynamic Dataset Curation [67.23953699167274]
Self-supervised learning (SSL) has enabled the development of vision foundation models for Earth Observation (EO)<n>In EO, this challenge is amplified by the redundancy and heavy-tailed distributions common in satellite imagery.<n>We propose a dynamic dataset pruning strategy designed to improve SSL pre-training by maximizing dataset diversity and balance.
arXiv Detail & Related papers (2025-04-09T15:13:26Z) - Contrastive Visual Data Augmentation [119.51630737874855]
Large multimodal models (LMMs) often struggle to recognize novel concepts, as they rely on pre-trained knowledge and have limited ability to capture subtle visual details.<n>We propose Contrastive visual Data Augmentation (CoDA) strategy to help LMMs better align nuanced visual features with language.<n>CoDA extracts key contrastive textual and visual features of target concepts against the known concepts they are misrecognized as, and then uses multimodal generative models to produce targeted synthetic data.
arXiv Detail & Related papers (2025-02-24T23:05:31Z) - Opinion-Unaware Blind Image Quality Assessment using Multi-Scale Deep Feature Statistics [54.08757792080732]
We propose integrating deep features from pre-trained visual models with a statistical analysis model to achieve opinion-unaware BIQA (OU-BIQA)
Our proposed model exhibits superior consistency with human visual perception compared to state-of-the-art BIQA models.
arXiv Detail & Related papers (2024-05-29T06:09:34Z) - Scalable Pre-training of Large Autoregressive Image Models [65.824197847617]
This paper introduces AIM, a collection of vision models pre-trained with an autoregressive objective.
We highlight two key findings: (1) the performance of the visual features scale with both the model capacity and the quantity of data, and (2) the value of the objective function correlates with the performance of the model on downstream tasks.
arXiv Detail & Related papers (2024-01-16T18:03:37Z) - MuLA-GAN: Multi-Level Attention GAN for Enhanced Underwater Visibility [1.9272863690919875]
We introduce MuLA-GAN, a novel approach that leverages the synergistic power of Geneversarative Adrial Networks (GANs) and Multi-Level Attention mechanisms for comprehensive underwater image enhancement.
Our model excels in capturing and preserving intricate details in underwater imagery, essential for various applications.
This work not only addresses a significant research gap in underwater image enhancement but also underscores the pivotal role of Multi-Level Attention in enhancing GANs.
arXiv Detail & Related papers (2023-12-25T07:33:47Z) - A deep neural network for multi-species fish detection using multiple
acoustic cameras [0.0]
We present a novel approach that takes advantage of both CNN (Convolutional Neural Network) and classical CV (Computer Vision) techniques.
The pipeline pre-treats the acoustic images to extract 2 features, in order to localise the signals and improve the detection performances.
The YOLOv3-based model was trained with data of fish from multiple species recorded by the two common acoustic cameras.
arXiv Detail & Related papers (2021-09-22T11:47:24Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z) - Temperate Fish Detection and Classification: a Deep Learning based
Approach [6.282069822653608]
We propose a two-step deep learning approach for the detection and classification of temperate fishes without pre-filtering.
The first step is to detect each single fish in an image, independent of species and sex.
In the second step, we adopt a Convolutional Neural Network (CNN) with the Squeeze-and-Excitation (SE) architecture for classifying each fish in the image without pre-filtering.
arXiv Detail & Related papers (2020-05-14T12:40:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.