Related papers: RGB-D Indiscernible Object Counting in Underwater Scenes

RGB-D Indiscernible Object Counting in Underwater Scenes

URL: http://arxiv.org/abs/2304.11677v2
Date: Mon, 13 Jan 2025 17:45:59 GMT
Title: RGB-D Indiscernible Object Counting in Underwater Scenes
Authors: Guolei Sun, Xiaogang Cheng, Zhaochong An, Xiaokang Wang, Yun Liu, Deng-Ping Fan, Ming-Ming Cheng, Luc Van Gool,
Abstract summary: Indiscernible object counting (IOC) aims to count objects that are blended with respect to their surroundings.<n>We present a large-scale dataset IOCfish5K which contains a total of 5,637 high-resolution images and 024,659 annotated center points.
Score: 105.05477155558398
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, indiscernible/camouflaged scene understanding has attracted lots of research attention in the vision community. We further advance the frontier of this field by systematically studying a new challenge named indiscernible object counting (IOC), the goal of which is to count objects that are blended with respect to their surroundings. Due to a lack of appropriate IOC datasets, we present a large-scale dataset IOCfish5K which contains a total of 5,637 high-resolution images and 659,024 annotated center points. Our dataset consists of a large number of indiscernible objects (mainly fish) in underwater scenes, making the annotation process all the more challenging. IOCfish5K is superior to existing datasets with indiscernible scenes because of its larger scale, higher image resolutions, more annotations, and denser scenes. All these aspects make it the most challenging dataset for IOC so far, supporting progress in this area. Benefiting from the recent advancements of depth estimation foundation models, we construct high-quality depth maps for IOCfish5K by generating pseudo labels using the Depth Anything V2 model. The RGB-D version of IOCfish5K is named IOCfish5K-D. For benchmarking purposes on IOCfish5K, we select 14 mainstream methods for object counting and carefully evaluate them. For multimodal IOCfish5K-D, we evaluate other 4 popular multimodal counting methods. Furthermore, we propose IOCFormer, a new strong baseline that combines density and regression branches in a unified framework and can effectively tackle object counting under concealed scenes. We also propose IOCFormer-D to enable the effective usage of depth modality in helping detect and count objects hidden in their environments. Experiments show that IOCFormer and IOCFormer-D achieve state-of-the-art scores on IOCfish5K and IOCfish5K-D, respectively.

Related papers

Depth-Assisted Network for Indiscernible Marine Object Counting with Adaptive Motion-Differentiated Feature Encoding [2.3552699229345264]
Indiscernible marine object counting encounters numerous challenges, including limited visibility in underwater scenes. We have developed a novel dataset comprising 50 videos, from which 800 frames have been extracted and annotated with around 40, point-wise object labels. This dataset accurately represents real underwater environments where indiscernible marine objects are intricately integrated with their surroundings.
arXiv Detail & Related papers (2025-03-11T08:08:04Z)
MID: A Comprehensive Shore-Based Dataset for Multi-Scale Dense Ship Occlusion and Interaction Scenarios [10.748210940033484]
The Maritime Ship Navigation Behavior dataset (MID) is designed to address challenges in ship detection within complex maritime environments. MID contains 5,673 images with 135,884 finely annotated target instances, supporting both supervised and semi-supervised learning. MID's images are sourced from high-definition video clips of real-world navigation across 43 water areas, with varied weather and lighting conditions.
arXiv Detail & Related papers (2024-12-08T09:34:23Z)
A Density-Guided Temporal Attention Transformer for Indiscernible Object Counting in Underwater Video [27.329015161325962]
Indiscernible object counting, which aims to count the number of targets that are blended with respect to their surroundings, has been a challenge. We propose a large-scale dataset called YoutubeFish-35, which contains a total of 35 sequences of high-definition videos. We propose TransVidCount, a new strong baseline that combines density and regression branches along the temporal domain in a unified framework.
arXiv Detail & Related papers (2024-03-06T04:54:00Z)
Improving Underwater Visual Tracking With a Large Scale Dataset and Image Enhancement [70.2429155741593]
This paper presents a new dataset and general tracker enhancement method for Underwater Visual Object Tracking (UVOT) It poses distinct challenges; the underwater environment exhibits non-uniform lighting conditions, low visibility, lack of sharpness, low contrast, camouflage, and reflections from suspended particles. We propose a novel underwater image enhancement algorithm designed specifically to boost tracking quality. The method has resulted in a significant performance improvement, of up to 5.0% AUC, of state-of-the-art (SOTA) visual trackers.
arXiv Detail & Related papers (2023-08-30T07:41:26Z)
LaRS: A Diverse Panoptic Maritime Obstacle Detection Dataset and Benchmark [9.864996020621701]
We present the first maritime panoptic obstacle detection benchmark LaRS, featuring scenes from Lakes, Rivers and Seas. LaRS is composed of over 4000 per-pixel labeled key frames with nine preceding frames to allow utilization of the temporal texture. We report the results of 27 semantic and panoptic segmentation methods, along with several performance insights and future research directions.
arXiv Detail & Related papers (2023-08-18T15:21:15Z)
Cascade-DETR: Delving into High-Quality Universal Object Detection [99.62131881419143]
We introduce Cascade-DETR for high-quality universal object detection. We propose the Cascade Attention layer, which explicitly integrates object-centric information into the detection decoder. Lastly, we introduce a universal object detection benchmark, UDB10, that contains 10 datasets from diverse domains.
arXiv Detail & Related papers (2023-07-20T17:11:20Z)
Densely Constrained Depth Estimator for Monocular 3D Object Detection [48.12271792836015]
Estimating accurate 3D locations of objects from monocular images is a challenging problem because of lacking depth. We propose a method that utilizes dense projection constraints from edges of any direction. The proposed method achieves state-of-the-art performance on the KITTI and WOD benchmarks.
arXiv Detail & Related papers (2022-07-20T17:24:22Z)
KOLOMVERSE: Korea open large-scale image dataset for object detection in the maritime universe [0.5732204366512352]
We present KOLOMVERSE, an open large-scale image dataset for object detection in the maritime domain by KRISO. We collected 5,845 hours of video data captured from 21 territorial waters of South Korea. The dataset has images of 3840$times$2160 pixels and to our knowledge, it is by far the largest publicly available dataset for object detection in the maritime domain.
arXiv Detail & Related papers (2022-06-20T16:45:12Z)
Highly Accurate Dichotomous Image Segmentation [139.79513044546]
A new task called dichotomous image segmentation (DIS) aims to segment highly accurate objects from natural images. We collect the first large-scale dataset, DIS5K, which contains 5,470 high-resolution (e.g., 2K, 4K or larger) images. We also introduce a simple intermediate supervision baseline (IS-Net) using both feature-level and mask-level guidance for DIS model training.
arXiv Detail & Related papers (2022-03-06T20:09:19Z)
ASOD60K: Audio-Induced Salient Object Detection in Panoramic Videos [79.05486554647918]
We propose PV-SOD, a new task that aims to segment salient objects from panoramic videos. In contrast to existing fixation-level or object-level saliency detection tasks, we focus on multi-modal salient object detection (SOD) We collect the first large-scale dataset, named ASOD60K, which contains 4K-resolution video frames annotated with a six-level hierarchy.
arXiv Detail & Related papers (2021-07-24T15:14:20Z)
Concealed Object Detection [140.98738087261887]
We present the first systematic study on concealed object detection (COD) COD aims to identify objects that are "perfectly" embedded in their background. To better understand this task, we collect a large-scale dataset called COD10K.
arXiv Detail & Related papers (2021-02-20T06:49:53Z)
Counting from Sky: A Large-scale Dataset for Remote Sensing Object Counting and A Benchmark Method [52.182698295053264]
We are interested in counting dense objects from remote sensing images. Compared with object counting in a natural scene, this task is challenging in the following factors: large scale variation, complex cluttered background, and orientation arbitrariness. To address these issues, we first construct a large-scale object counting dataset with remote sensing images, which contains four important geographic objects. We then benchmark the dataset by designing a novel neural network that can generate a density map of an input image.
arXiv Detail & Related papers (2020-08-28T03:47:49Z)
RPT: Learning Point Set Representation for Siamese Visual Tracking [15.04182251944942]
We propose an effcient visual tracking framework to accurately estimate the target state with a finer representation as a set of representative points. Our method achieves new state-of-the-art performance while running at over 20 FPS.
arXiv Detail & Related papers (2020-08-08T07:42:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.