RGB-D Indiscernible Object Counting in Underwater Scenes
- URL: http://arxiv.org/abs/2304.11677v2
- Date: Mon, 13 Jan 2025 17:45:59 GMT
- Title: RGB-D Indiscernible Object Counting in Underwater Scenes
- Authors: Guolei Sun, Xiaogang Cheng, Zhaochong An, Xiaokang Wang, Yun Liu, Deng-Ping Fan, Ming-Ming Cheng, Luc Van Gool,
- Abstract summary: Indiscernible object counting (IOC) aims to count objects that are blended with respect to their surroundings.
We present a large-scale dataset IOCfish5K which contains a total of 5,637 high-resolution images and 024,659 annotated center points.
- Score: 105.05477155558398
- License:
- Abstract: Recently, indiscernible/camouflaged scene understanding has attracted lots of research attention in the vision community. We further advance the frontier of this field by systematically studying a new challenge named indiscernible object counting (IOC), the goal of which is to count objects that are blended with respect to their surroundings. Due to a lack of appropriate IOC datasets, we present a large-scale dataset IOCfish5K which contains a total of 5,637 high-resolution images and 659,024 annotated center points. Our dataset consists of a large number of indiscernible objects (mainly fish) in underwater scenes, making the annotation process all the more challenging. IOCfish5K is superior to existing datasets with indiscernible scenes because of its larger scale, higher image resolutions, more annotations, and denser scenes. All these aspects make it the most challenging dataset for IOC so far, supporting progress in this area. Benefiting from the recent advancements of depth estimation foundation models, we construct high-quality depth maps for IOCfish5K by generating pseudo labels using the Depth Anything V2 model. The RGB-D version of IOCfish5K is named IOCfish5K-D. For benchmarking purposes on IOCfish5K, we select 14 mainstream methods for object counting and carefully evaluate them. For multimodal IOCfish5K-D, we evaluate other 4 popular multimodal counting methods. Furthermore, we propose IOCFormer, a new strong baseline that combines density and regression branches in a unified framework and can effectively tackle object counting under concealed scenes. We also propose IOCFormer-D to enable the effective usage of depth modality in helping detect and count objects hidden in their environments. Experiments show that IOCFormer and IOCFormer-D achieve state-of-the-art scores on IOCfish5K and IOCfish5K-D, respectively.
Related papers
- MID: A Comprehensive Shore-Based Dataset for Multi-Scale Dense Ship Occlusion and Interaction Scenarios [10.748210940033484]
The Maritime Ship Navigation Behavior dataset (MID) is designed to address challenges in ship detection within complex maritime environments.
MID contains 5,673 images with 135,884 finely annotated target instances, supporting both supervised and semi-supervised learning.
MID's images are sourced from high-definition video clips of real-world navigation across 43 water areas, with varied weather and lighting conditions.
arXiv Detail & Related papers (2024-12-08T09:34:23Z) - A Density-Guided Temporal Attention Transformer for Indiscernible Object
Counting in Underwater Video [27.329015161325962]
Indiscernible object counting, which aims to count the number of targets that are blended with respect to their surroundings, has been a challenge.
We propose a large-scale dataset called YoutubeFish-35, which contains a total of 35 sequences of high-definition videos.
We propose TransVidCount, a new strong baseline that combines density and regression branches along the temporal domain in a unified framework.
arXiv Detail & Related papers (2024-03-06T04:54:00Z) - Improving Underwater Visual Tracking With a Large Scale Dataset and
Image Enhancement [70.2429155741593]
This paper presents a new dataset and general tracker enhancement method for Underwater Visual Object Tracking (UVOT)
It poses distinct challenges; the underwater environment exhibits non-uniform lighting conditions, low visibility, lack of sharpness, low contrast, camouflage, and reflections from suspended particles.
We propose a novel underwater image enhancement algorithm designed specifically to boost tracking quality.
The method has resulted in a significant performance improvement, of up to 5.0% AUC, of state-of-the-art (SOTA) visual trackers.
arXiv Detail & Related papers (2023-08-30T07:41:26Z) - LaRS: A Diverse Panoptic Maritime Obstacle Detection Dataset and
Benchmark [9.864996020621701]
We present the first maritime panoptic obstacle detection benchmark LaRS, featuring scenes from Lakes, Rivers and Seas.
LaRS is composed of over 4000 per-pixel labeled key frames with nine preceding frames to allow utilization of the temporal texture.
We report the results of 27 semantic and panoptic segmentation methods, along with several performance insights and future research directions.
arXiv Detail & Related papers (2023-08-18T15:21:15Z) - Cascade-DETR: Delving into High-Quality Universal Object Detection [99.62131881419143]
We introduce Cascade-DETR for high-quality universal object detection.
We propose the Cascade Attention layer, which explicitly integrates object-centric information into the detection decoder.
Lastly, we introduce a universal object detection benchmark, UDB10, that contains 10 datasets from diverse domains.
arXiv Detail & Related papers (2023-07-20T17:11:20Z) - Densely Constrained Depth Estimator for Monocular 3D Object Detection [48.12271792836015]
Estimating accurate 3D locations of objects from monocular images is a challenging problem because of lacking depth.
We propose a method that utilizes dense projection constraints from edges of any direction.
The proposed method achieves state-of-the-art performance on the KITTI and WOD benchmarks.
arXiv Detail & Related papers (2022-07-20T17:24:22Z) - KOLOMVERSE: Korea open large-scale image dataset for object detection in the maritime universe [0.5732204366512352]
We present KOLOMVERSE, an open large-scale image dataset for object detection in the maritime domain by KRISO.
We collected 5,845 hours of video data captured from 21 territorial waters of South Korea.
The dataset has images of 3840$times$2160 pixels and to our knowledge, it is by far the largest publicly available dataset for object detection in the maritime domain.
arXiv Detail & Related papers (2022-06-20T16:45:12Z) - Highly Accurate Dichotomous Image Segmentation [139.79513044546]
A new task called dichotomous image segmentation (DIS) aims to segment highly accurate objects from natural images.
We collect the first large-scale dataset, DIS5K, which contains 5,470 high-resolution (e.g., 2K, 4K or larger) images.
We also introduce a simple intermediate supervision baseline (IS-Net) using both feature-level and mask-level guidance for DIS model training.
arXiv Detail & Related papers (2022-03-06T20:09:19Z) - Concealed Object Detection [140.98738087261887]
We present the first systematic study on concealed object detection (COD)
COD aims to identify objects that are "perfectly" embedded in their background.
To better understand this task, we collect a large-scale dataset called COD10K.
arXiv Detail & Related papers (2021-02-20T06:49:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.