FishDetector-R1: Unified MLLM-Based Framework with Reinforcement Fine-Tuning for Weakly Supervised Fish Detection, Segmentation, and Counting
- URL: http://arxiv.org/abs/2512.05996v1
- Date: Mon, 01 Dec 2025 06:23:56 GMT
- Title: FishDetector-R1: Unified MLLM-Based Framework with Reinforcement Fine-Tuning for Weakly Supervised Fish Detection, Segmentation, and Counting
- Authors: Yi Liu, Jingyu Song, Vedanth Kallakuri, Katherine A. Skinner,
- Abstract summary: We introduce FishDetector-R1, a unified MLLM-based framework for fish detection, segmentation, and counting under weak supervision.<n>On the DeepFish dataset, our framework achieves substantial gains over baselines, improving AP by 20% and mIoU by 10%, while reducing MAE by 30% and GAME by 35%.
- Score: 12.040327353059945
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Analyzing underwater fish imagery is critical for ecological monitoring but remains difficult due to visual degradation and costly annotations. We introduce FishDetector-R1, a unified MLLM-based framework for fish detection, segmentation, and counting under weak supervision. On the DeepFish dataset, our framework achieves substantial gains over baselines, improving AP by 20% and mIoU by 10%, while reducing MAE by 30% and GAME by 35%. These improvements stem from two key components: a novel detect-to-count prompt that enforces spatially consistent detections and counts, and Reinforcement Learning from Verifiable Reward (RLVR) with a complementary scalable paradigm leveraging sparse point labels. Ablation studies further validate the effectiveness of this reward design. Moreover, the improvement generalizes well to other underwater datasets, confirming strong cross-domain robustness. Overall, FishDetector-R1 provides a reliable and scalable solution for accurate marine visual understanding via weak supervision. The project page for FishDetector-R1 is https://umfieldrobotics.github.io/FishDetector-R1.
Related papers
- Estimation of Fish Catch Using Sentinel-2, 3 and XGBoost-Kernel-Based Kernel Ridge Regression [0.7433903349647366]
This study uses multispectral images from Sentinel-2 MSI and Sentinel-3 OLCI to estimate fish catch.<n>The proposed approach advances SDGs 2 (Zero Hunger) and 14 (Life Below Water)
arXiv Detail & Related papers (2026-02-09T11:02:57Z) - Towards Visual Re-Identification of Fish using Fine-Grained Classification for Electronic Monitoring in Fisheries [4.007351600492542]
We develop an optimized deep learning pipeline for automated fish re-identification using the novel AutoFish dataset.<n>We demonstrate that the Vision Transformer-based Swin-T architecture consistently outperforms the Convolutional Neural Network-based ResNet-50.<n>An in-depth analysis reveals that the primary challenge is distinguishing visually similar individuals of the same species.
arXiv Detail & Related papers (2025-12-09T09:33:53Z) - Practical Manipulation Model for Robust Deepfake Detection [55.2480439325792]
We develop a more real-world degradation model in the area of image super-resolution.<n>We extend the space of pseudo-fakes by using Poisson blending, more diverse masks, generator artifacts, and distractors.<n>We show clear increases of $3.51%$ and $6.21%$ AUC on the DFDC and DFDCP datasets, respectively.
arXiv Detail & Related papers (2025-06-05T15:06:16Z) - FMRFT: Fusion Mamba and DETR for Query Time Sequence Intersection Fish Tracking [3.599033310931609]
This paper establishes a complex multi-scenario sturgeon tracking dataset.<n>It introduces the FMRFT model, a real-time end-to-end fish tracking solution.<n>The model incorporates the low video memory consumption Mamba In Mamba architecture.
arXiv Detail & Related papers (2024-09-02T10:33:45Z) - A method for detecting dead fish on large water surfaces based on improved YOLOv10 [0.6874745415692134]
Dead fish can cause significant issues such as water quality deterioration, ecosystem damage, and disease transmission.
This paper proposes an end-to-end detection model built upon an enhanced YOLOv10 framework.
arXiv Detail & Related papers (2024-08-31T08:43:37Z) - FishMOT: A Simple and Effective Method for Fish Tracking Based on IoU
Matching [11.39414015803651]
FishMOT is a novel fish tracking approach combining object detection and objectoU matching.
The method exhibits excellent robustness and generalizability for varying environments and fish numbers.
arXiv Detail & Related papers (2023-09-06T13:16:41Z) - Learning Heavily-Degraded Prior for Underwater Object Detection [59.5084433933765]
This paper seeks transferable prior knowledge from detector-friendly images.
It is based on statistical observations that, the heavily degraded regions of detector-friendly (DFUI) and underwater images have evident feature distribution gaps.
Our method with higher speeds and less parameters still performs better than transformer-based detectors.
arXiv Detail & Related papers (2023-08-24T12:32:46Z) - Small Object Detection via Coarse-to-fine Proposal Generation and
Imitation Learning [52.06176253457522]
We propose a two-stage framework tailored for small object detection based on the Coarse-to-fine pipeline and Feature Imitation learning.
CFINet achieves state-of-the-art performance on the large-scale small object detection benchmarks, SODA-D and SODA-A.
arXiv Detail & Related papers (2023-08-18T13:13:09Z) - Efficient Decoder-free Object Detection with Transformers [75.00499377197475]
Vision transformers (ViTs) are changing the landscape of object detection approaches.
We propose a decoder-free fully transformer-based (DFFT) object detector.
DFFT_SMALL achieves high efficiency in both training and inference stages.
arXiv Detail & Related papers (2022-06-14T13:22:19Z) - Dense Label Encoding for Boundary Discontinuity Free Rotation Detection [69.75559390700887]
This paper explores a relatively less-studied methodology based on classification.
We propose new techniques to push its frontier in two aspects.
Experiments and visual analysis on large-scale public datasets for aerial images show the effectiveness of our approach.
arXiv Detail & Related papers (2020-11-19T05:42:02Z) - RepPoints V2: Verification Meets Regression for Object Detection [65.120827759348]
We introduce verification tasks into the localization prediction of RepPoints.
RepPoints v2 provides consistent improvements of about 2.0 mAP over the original RepPoints.
We show that the proposed approach can more generally elevate other object detection frameworks as well as applications such as instance segmentation.
arXiv Detail & Related papers (2020-07-16T17:57:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.