A Density-Guided Temporal Attention Transformer for Indiscernible Object
  Counting in Underwater Video
        - URL: http://arxiv.org/abs/2403.03461v1
- Date: Wed, 6 Mar 2024 04:54:00 GMT
- Title: A Density-Guided Temporal Attention Transformer for Indiscernible Object
  Counting in Underwater Video
- Authors: Cheng-Yen Yang, Hsiang-Wei Huang, Zhongyu Jiang, Hao Wang, Farron
  Wallace, Jenq-Neng Hwang
- Abstract summary: Indiscernible object counting, which aims to count the number of targets that are blended with respect to their surroundings, has been a challenge.
We propose a large-scale dataset called YoutubeFish-35, which contains a total of 35 sequences of high-definition videos.
We propose TransVidCount, a new strong baseline that combines density and regression branches along the temporal domain in a unified framework.
- Score: 27.329015161325962
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   Dense object counting or crowd counting has come a long way thanks to the
recent development in the vision community. However, indiscernible object
counting, which aims to count the number of targets that are blended with
respect to their surroundings, has been a challenge. Image-based object
counting datasets have been the mainstream of the current publicly available
datasets. Therefore, we propose a large-scale dataset called YoutubeFish-35,
which contains a total of 35 sequences of high-definition videos with high
frame-per-second and more than 150,000 annotated center points across a
selected variety of scenes. For benchmarking purposes, we select three
mainstream methods for dense object counting and carefully evaluate them on the
newly collected dataset. We propose TransVidCount, a new strong baseline that
combines density and regression branches along the temporal domain in a unified
framework and can effectively tackle indiscernible object counting with
state-of-the-art performance on YoutubeFish-35 dataset.
 
      
        Related papers
        - Open-World Object Counting in Videos [55.2480439325792]
 We introduce a new task of open-world object counting in videos.<n>The objective is to enumerate all the unique instances of the target objects in the video.<n>We introduce a model, CountVid, for this task.
 arXiv  Detail & Related papers  (2025-06-18T11:35:30Z)
- Video Individual Counting for Moving Drones [51.429771128144964]
 Video Individual Counting (VIC) has received increasing attentions recently due to its importance in intelligent video surveillance.
Previous crowd counting datasets are captured with fixed or rarely moving cameras with relatively sparse individuals.
We propose a density map based VIC method based on a MovingDroneCrowd dataset.
 arXiv  Detail & Related papers  (2025-03-12T07:09:33Z)
- Depth-Assisted Network for Indiscernible Marine Object Counting with   Adaptive Motion-Differentiated Feature Encoding [2.3552699229345264]
 Indiscernible marine object counting encounters numerous challenges, including limited visibility in underwater scenes.
We have developed a novel dataset comprising 50 videos, from which 800 frames have been extracted and annotated with around 40, point-wise object labels.
This dataset accurately represents real underwater environments where indiscernible marine objects are intricately integrated with their surroundings.
 arXiv  Detail & Related papers  (2025-03-11T08:08:04Z)
- Counting Stacked Objects [57.68870743111393]
 We propose a novel 3D counting approach that decomposes the task into two complementary subproblems.
By combining geometric reconstruction and deep learning-based depth analysis, our method can accurately count identical objects within containers.
We validate our 3D Counting pipeline on diverse real-world and large-scale synthetic datasets.
 arXiv  Detail & Related papers  (2024-11-28T13:51:16Z)
- Contrastive Lift: 3D Object Instance Segmentation by Slow-Fast
  Contrastive Fusion [110.84357383258818]
 We propose a novel approach to lift 2D segments to 3D and fuse them by means of a neural field representation.
The core of our approach is a slow-fast clustering objective function, which is scalable and well-suited for scenes with a large number of objects.
Our approach outperforms the state-of-the-art on challenging scenes from the ScanNet, Hypersim, and Replica datasets.
 arXiv  Detail & Related papers  (2023-06-07T17:57:45Z)
- Indiscernible Object Counting in Underwater Scenes [91.86044762367945]
 Indiscernible object counting is the goal of which is to count objects that are blended with respect to their surroundings.
We present a large-scale dataset IOCfish5K which contains a total of 5,637 high-resolution images and 659,024 annotated center points.
 arXiv  Detail & Related papers  (2023-04-23T15:09:02Z)
- Tiny Object Tracking: A Large-scale Dataset and A Baseline [40.93697515531104]
 We create a large-scale video dataset, which contains 434 sequences with a total of more than 217K frames.
In data creation, we take 12 challenge attributes into account to cover a broad range of viewpoints and scene complexities.
We propose a novel Multilevel Knowledge Distillation Network (MKDNet), which pursues three-level knowledge distillations in a unified framework.
 arXiv  Detail & Related papers  (2022-02-11T15:00:32Z)
- Counting from Sky: A Large-scale Dataset for Remote Sensing Object
  Counting and A Benchmark Method [52.182698295053264]
 We are interested in counting dense objects from remote sensing images. Compared with object counting in a natural scene, this task is challenging in the following factors: large scale variation, complex cluttered background, and orientation arbitrariness.
To address these issues, we first construct a large-scale object counting dataset with remote sensing images, which contains four important geographic objects.
We then benchmark the dataset by designing a novel neural network that can generate a density map of an input image.
 arXiv  Detail & Related papers  (2020-08-28T03:47:49Z)
- TAO: A Large-Scale Benchmark for Tracking Any Object [95.87310116010185]
 Tracking Any Object dataset consists of 2,907 high resolution videos, captured in diverse environments, which are half a minute long on average.
We ask annotators to label objects that move at any point in the video, and give names to them post factum.
Our vocabulary is both significantly larger and qualitatively different from existing tracking datasets.
 arXiv  Detail & Related papers  (2020-05-20T21:07:28Z)
- Rethinking Object Detection in Retail Stores [55.359582952686175]
 We propose a new task, simultaneously object localization and counting, abbreviated as Locount.
Locount requires algorithms to localize groups of objects of interest with the number of instances.
We collect a large-scale object localization and counting dataset with rich annotations in retail stores.
 arXiv  Detail & Related papers  (2020-03-18T14:01:54Z)
- Counting dense objects in remote sensing images [52.182698295053264]
 Estimating number of interested objects from a given image is a challenging yet important task.
In this paper, we are interested in counting dense objects from remote sensing images.
To address these issues, we first construct a large-scale object counting dataset based on remote sensing images.
We then benchmark the dataset by designing a novel neural network which can generate density map of an input image.
 arXiv  Detail & Related papers  (2020-02-14T09:13:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.