Towards Visual Re-Identification of Fish using Fine-Grained Classification for Electronic Monitoring in Fisheries
- URL: http://arxiv.org/abs/2512.08400v2
- Date: Thu, 11 Dec 2025 11:39:09 GMT
- Title: Towards Visual Re-Identification of Fish using Fine-Grained Classification for Electronic Monitoring in Fisheries
- Authors: Samitha Nuwan Thilakarathna, Ercan Avsar, Martin Mathias Nielsen, Malte Pedersen,
- Abstract summary: We develop an optimized deep learning pipeline for automated fish re-identification using the novel AutoFish dataset.<n>We demonstrate that the Vision Transformer-based Swin-T architecture consistently outperforms the Convolutional Neural Network-based ResNet-50.<n>An in-depth analysis reveals that the primary challenge is distinguishing visually similar individuals of the same species.
- Score: 4.007351600492542
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Accurate fisheries data are crucial for effective and sustainable marine resource management. With the recent adoption of Electronic Monitoring (EM) systems, more video data is now being collected than can be feasibly reviewed manually. This paper addresses this challenge by developing an optimized deep learning pipeline for automated fish re-identification (Re-ID) using the novel AutoFish dataset, which simulates EM systems with conveyor belts with six similarly looking fish species. We demonstrate that key Re-ID metrics (R1 and mAP@k) are substantially improved by using hard triplet mining in conjunction with a custom image transformation pipeline that includes dataset-specific normalization. By employing these strategies, we demonstrate that the Vision Transformer-based Swin-T architecture consistently outperforms the Convolutional Neural Network-based ResNet-50, achieving peak performance of 41.65% mAP@k and 90.43% Rank-1 accuracy. An in-depth analysis reveals that the primary challenge is distinguishing visually similar individuals of the same species (Intra-species errors), where viewpoint inconsistency proves significantly more detrimental than partial occlusion. The source code and documentation are available at: https://github.com/msamdk/Fish_Re_Identification.git
Related papers
- FishDetector-R1: Unified MLLM-Based Framework with Reinforcement Fine-Tuning for Weakly Supervised Fish Detection, Segmentation, and Counting [12.040327353059945]
We introduce FishDetector-R1, a unified MLLM-based framework for fish detection, segmentation, and counting under weak supervision.<n>On the DeepFish dataset, our framework achieves substantial gains over baselines, improving AP by 20% and mIoU by 10%, while reducing MAE by 30% and GAME by 35%.
arXiv Detail & Related papers (2025-12-01T06:23:56Z) - FishDet-M: A Unified Large-Scale Benchmark for Robust Fish Detection and CLIP-Guided Model Selection in Diverse Aquatic Visual Domains [1.3791394805787949]
FishDet-M is the largest unified benchmark for fish detection, comprising 13 publicly available datasets spanning diverse aquatic environments.<n>All data are harmonized using COCO-style annotations with both bounding boxes and segmentation masks.<n>FishDet-M establishes a standardized and reproducible platform for evaluating object detection in complex aquatic scenes.
arXiv Detail & Related papers (2025-07-23T18:32:01Z) - Causal Transformer for Fusion and Pose Estimation in Deep Visual Inertial Odometry [1.2289361708127877]
We propose a causal visual-inertial fusion transformer (VIFT) for pose estimation in deep visual-inertial odometry.
The proposed method is end-to-end trainable and requires only a monocular camera and IMU during inference.
arXiv Detail & Related papers (2024-09-13T12:21:25Z) - FMRFT: Fusion Mamba and DETR for Query Time Sequence Intersection Fish Tracking [3.599033310931609]
This paper establishes a complex multi-scenario sturgeon tracking dataset.<n>It introduces the FMRFT model, a real-time end-to-end fish tracking solution.<n>The model incorporates the low video memory consumption Mamba In Mamba architecture.
arXiv Detail & Related papers (2024-09-02T10:33:45Z) - Open-Set Deepfake Detection: A Parameter-Efficient Adaptation Method with Forgery Style Mixture [81.93945602120453]
We introduce an approach that is both general and parameter-efficient for face forgery detection.<n>We design a forgery-style mixture formulation that augments the diversity of forgery source domains.<n>We show that the designed model achieves state-of-the-art generalizability with significantly reduced trainable parameters.
arXiv Detail & Related papers (2024-08-23T01:53:36Z) - CMFDFormer: Transformer-based Copy-Move Forgery Detection with Continual
Learning [52.72888626663642]
Copy-move forgery detection aims at detecting duplicated regions in a suspected forged image.
Deep learning based copy-move forgery detection methods are in the ascendant.
We propose a Transformer-style copy-move forgery network named as CMFDFormer.
We also provide a novel PCSD continual learning framework to help CMFDFormer handle new tasks.
arXiv Detail & Related papers (2023-11-22T09:27:46Z) - Convolutional Monge Mapping Normalization for learning on sleep data [63.22081662149488]
We propose a new method called Convolutional Monge Mapping Normalization (CMMN)
CMMN consists in filtering the signals in order to adapt their power spectrum density (PSD) to a Wasserstein barycenter estimated on training data.
Numerical experiments on sleep EEG data show that CMMN leads to significant and consistent performance gains independent from the neural network architecture.
arXiv Detail & Related papers (2023-05-30T08:24:01Z) - Weakly Supervised Change Detection Using Guided Anisotropic Difusion [97.43170678509478]
We propose original ideas that help us to leverage such datasets in the context of change detection.
First, we propose the guided anisotropic diffusion (GAD) algorithm, which improves semantic segmentation results.
We then show its potential in two weakly-supervised learning strategies tailored for change detection.
arXiv Detail & Related papers (2021-12-31T10:03:47Z) - Self-Supervised Pre-Training for Transformer-Based Person
Re-Identification [54.55281692768765]
Transformer-based supervised pre-training achieves great performance in person re-identification (ReID)
Due to the domain gap between ImageNet and ReID datasets, it usually needs a larger pre-training dataset to boost the performance.
This work aims to mitigate the gap between the pre-training and ReID datasets from the perspective of data and model structure.
arXiv Detail & Related papers (2021-11-23T18:59:08Z) - Vision Transformers are Robust Learners [65.91359312429147]
We study the robustness of the Vision Transformer (ViT) against common corruptions and perturbations, distribution shifts, and natural adversarial examples.
We present analyses that provide both quantitative and qualitative indications to explain why ViTs are indeed more robust learners.
arXiv Detail & Related papers (2021-05-17T02:39:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.