Related papers: Deep Learning for Accurate Vision-based Catch Composition in Tropical Tuna Purse Seiners

Deep Learning for Accurate Vision-based Catch Composition in Tropical Tuna Purse Seiners

URL: http://arxiv.org/abs/2511.15468v1
Date: Wed, 19 Nov 2025 14:26:04 GMT
Title: Deep Learning for Accurate Vision-based Catch Composition in Tropical Tuna Purse Seiners
Authors: Xabier Lekunberri, Ahmad Kamal, Izaro Goienetxea, Jon Ruiz, Iñaki Quincoces, Jaime Valls Miro, Ignacio Arganda-Carreras, Jose A. Fernandes-Salvador,
Abstract summary: We quantify the difficulty experts face to distinguish bigeye tuna from yellowfin tuna using images captured by electronic monitoring systems.<n>We present a multi-stage pipeline to estimate the species composition of the catches using a reliable ground-truth dataset.<n>We found that the latest performs the best, with a validation mean average precision of 0.66 $pm$ 0.03 and a recall of 0.88 $pm$ 0.03.
Score: 1.9503589459693256
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Purse seiners play a crucial role in tuna fishing, as approximately 69% of the world's tropical tuna is caught using this gear. All tuna Regional Fisheries Management Organizations have established minimum standards to use electronic monitoring (EM) in fisheries in addition to traditional observers. The EM systems produce a massive amount of video data that human analysts must process. Integrating artificial intelligence (AI) into their workflow can decrease that workload and improve the accuracy of the reports. However, species identification still poses significant challenges for AI, as achieving balanced performance across all species requires appropriate training data. Here, we quantify the difficulty experts face to distinguish bigeye tuna (BET, Thunnus Obesus) from yellowfin tuna (YFT, Thunnus Albacares) using images captured by EM systems. We found inter-expert agreements of 42.9% $\pm$ 35.6% for BET and 57.1% $\pm$ 35.6% for YFT. We then present a multi-stage pipeline to estimate the species composition of the catches using a reliable ground-truth dataset based on identifications made by observers on board. Three segmentation approaches are compared: Mask R-CNN, a combination of DINOv2 with SAM2, and a integration of YOLOv9 with SAM2. We found that the latest performs the best, with a validation mean average precision of 0.66 $\pm$ 0.03 and a recall of 0.88 $\pm$ 0.03. Segmented individuals are tracked using ByteTrack. For classification, we evaluate a standard multiclass classification model and a hierarchical approach, finding a superior generalization by the hierarchical. All our models were cross-validated during training and tested on fishing operations with fully known catch composition. Combining YOLOv9-SAM2 with the hierarchical classification produced the best estimations, with 84.8% of the individuals being segmented and classified with a mean average error of 4.5%.

Related papers

Multimodal RewardBench 2: Evaluating Omni Reward Models for Interleaved Text and Image [58.14192385042352]
We introduce Multimodal RewardBench 2 (MMRB2), the first benchmark for reward models on multimodal understanding and (interleaved) generation.<n>MMRB2 spans four tasks: text-to-image, image editing, interleaved generation, and multimodal reasoning.<n>It provides 1,000 expert-annotated preference pairs per task from 23 models and agents across 21 source tasks.
arXiv Detail & Related papers (2025-12-18T18:56:04Z)
Towards Visual Re-Identification of Fish using Fine-Grained Classification for Electronic Monitoring in Fisheries [4.007351600492542]
We develop an optimized deep learning pipeline for automated fish re-identification using the novel AutoFish dataset.<n>We demonstrate that the Vision Transformer-based Swin-T architecture consistently outperforms the Convolutional Neural Network-based ResNet-50.<n>An in-depth analysis reveals that the primary challenge is distinguishing visually similar individuals of the same species.
arXiv Detail & Related papers (2025-12-09T09:33:53Z)
Multispectral airborne laser scanning for tree species classification: a benchmark of machine learning and deep learning algorithms [3.9167717582896793]
Multispectral airborne laser scanning (ALS) has shown promise in automated point cloud processing and tree segmentation.<n>This study addresses these gaps by conducting a benchmark of machine learning and deep learning methods for tree species classification.
arXiv Detail & Related papers (2025-04-19T16:03:49Z)
One-Shot Learning for Periocular Recognition: Exploring the Effect of Domain Adaptation and Data Bias on Deep Representations [59.17685450892182]
We investigate the behavior of deep representations in widely used CNN models under extreme data scarcity for One-Shot periocular recognition. We improved state-of-the-art results that made use of networks trained with biometric datasets with millions of images. Traditional algorithms like SIFT can outperform CNNs in situations with limited data.
arXiv Detail & Related papers (2023-07-11T09:10:16Z)
DeepSeaNet: Improving Underwater Object Detection using EfficientDet [0.0]
This project involves implementing and evaluating various object detection models on an annotated underwater dataset. The dataset comprises annotated image sequences of fish, crabs, starfish, and other aquatic animals captured in Limfjorden water with limited visibility. I compare the results of YOLOv3 (31.10% mean Average Precision (mAP)), YOLOv4 (83.72% mAP), YOLOv5 (97.6%), YOLOv8 (98.20%), EfficientDet (98.56% mAP) and Detectron2 (95.20% mAP) on the same dataset.
arXiv Detail & Related papers (2023-05-26T13:41:35Z)
Pruning by Active Attention Manipulation [49.61707925611295]
Filter pruning of a CNN is typically achieved by applying discrete masks on the CNN's filter weights or activation maps, post-training. Here, we present a new filter-importance-scoring concept named pruning by active attention manipulation (PAAM) PAAM learns analog filter scores from the filter weights by optimizing a cost function regularized by an additive term in the scores.
arXiv Detail & Related papers (2022-10-20T09:17:02Z)
Towards Semi-Supervised Deep Facial Expression Recognition with An Adaptive Confidence Margin [92.76372026435858]
We learn an Adaptive Confidence Margin (Ada-CM) to fully leverage all unlabeled data for semi-supervised deep facial expression recognition. All unlabeled samples are partitioned into two subsets by comparing their confidence scores with the adaptively learned confidence margin. Our method achieves state-of-the-art performance, especially surpassing fully-supervised baselines in a semi-supervised manner.
arXiv Detail & Related papers (2022-03-23T11:43:29Z)
Robust Segmentation Models using an Uncertainty Slice Sampling Based Annotation Workflow [5.051373749267151]
We propose an uncertainty slice sampling (USS) strategy for semantic segmentation of 3D medical volumes. We demonstrate the efficiency of USS on a liver segmentation task using multi-site data.
arXiv Detail & Related papers (2021-09-30T06:56:11Z)
NemaNet: A convolutional neural network model for identification of nematodes soybean crop in brazil [0.43968605222413054]
Phytoparasitic nematodes (or phytonematodes) are causing severe damage to crops and generating large-scale economic losses worldwide. This work presents a new public data set called NemaDataset containing 3,063 microscopic images from five nematode species with the most significant damage relevance for the soybean crop.
arXiv Detail & Related papers (2021-03-05T14:47:00Z)
Automatic sleep stage classification with deep residual networks in a mixed-cohort setting [63.52264764099532]
We developed a novel deep neural network model to assess the generalizability of several large-scale cohorts. Overall classification accuracy improved with increasing fractions of training data.
arXiv Detail & Related papers (2020-08-21T10:48:35Z)
CovidDeep: SARS-CoV-2/COVID-19 Test Based on Wearable Medical Sensors and Efficient Neural Networks [51.589769497681175]
The novel coronavirus (SARS-CoV-2) has led to a pandemic. The current testing regime based on Reverse Transcription-Polymerase Chain Reaction for SARS-CoV-2 has been unable to keep up with testing demands. We propose a framework called CovidDeep that combines efficient DNNs with commercially available WMSs for pervasive testing of the virus.
arXiv Detail & Related papers (2020-07-20T21:47:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.