A Density-Guided Temporal Attention Transformer for Indiscernible Object
Counting in Underwater Video
- URL: http://arxiv.org/abs/2403.03461v1
- Date: Wed, 6 Mar 2024 04:54:00 GMT
- Title: A Density-Guided Temporal Attention Transformer for Indiscernible Object
Counting in Underwater Video
- Authors: Cheng-Yen Yang, Hsiang-Wei Huang, Zhongyu Jiang, Hao Wang, Farron
Wallace, Jenq-Neng Hwang
- Abstract summary: Indiscernible object counting, which aims to count the number of targets that are blended with respect to their surroundings, has been a challenge.
We propose a large-scale dataset called YoutubeFish-35, which contains a total of 35 sequences of high-definition videos.
We propose TransVidCount, a new strong baseline that combines density and regression branches along the temporal domain in a unified framework.
- Score: 27.329015161325962
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dense object counting or crowd counting has come a long way thanks to the
recent development in the vision community. However, indiscernible object
counting, which aims to count the number of targets that are blended with
respect to their surroundings, has been a challenge. Image-based object
counting datasets have been the mainstream of the current publicly available
datasets. Therefore, we propose a large-scale dataset called YoutubeFish-35,
which contains a total of 35 sequences of high-definition videos with high
frame-per-second and more than 150,000 annotated center points across a
selected variety of scenes. For benchmarking purposes, we select three
mainstream methods for dense object counting and carefully evaluate them on the
newly collected dataset. We propose TransVidCount, a new strong baseline that
combines density and regression branches along the temporal domain in a unified
framework and can effectively tackle indiscernible object counting with
state-of-the-art performance on YoutubeFish-35 dataset.
Related papers
- Contrastive Lift: 3D Object Instance Segmentation by Slow-Fast
Contrastive Fusion [110.84357383258818]
We propose a novel approach to lift 2D segments to 3D and fuse them by means of a neural field representation.
The core of our approach is a slow-fast clustering objective function, which is scalable and well-suited for scenes with a large number of objects.
Our approach outperforms the state-of-the-art on challenging scenes from the ScanNet, Hypersim, and Replica datasets.
arXiv Detail & Related papers (2023-06-07T17:57:45Z) - Indiscernible Object Counting in Underwater Scenes [91.86044762367945]
Indiscernible object counting is the goal of which is to count objects that are blended with respect to their surroundings.
We present a large-scale dataset IOCfish5K which contains a total of 5,637 high-resolution images and 659,024 annotated center points.
arXiv Detail & Related papers (2023-04-23T15:09:02Z) - Tiny Object Tracking: A Large-scale Dataset and A Baseline [40.93697515531104]
We create a large-scale video dataset, which contains 434 sequences with a total of more than 217K frames.
In data creation, we take 12 challenge attributes into account to cover a broad range of viewpoints and scene complexities.
We propose a novel Multilevel Knowledge Distillation Network (MKDNet), which pursues three-level knowledge distillations in a unified framework.
arXiv Detail & Related papers (2022-02-11T15:00:32Z) - Counting from Sky: A Large-scale Dataset for Remote Sensing Object
Counting and A Benchmark Method [52.182698295053264]
We are interested in counting dense objects from remote sensing images. Compared with object counting in a natural scene, this task is challenging in the following factors: large scale variation, complex cluttered background, and orientation arbitrariness.
To address these issues, we first construct a large-scale object counting dataset with remote sensing images, which contains four important geographic objects.
We then benchmark the dataset by designing a novel neural network that can generate a density map of an input image.
arXiv Detail & Related papers (2020-08-28T03:47:49Z) - TAO: A Large-Scale Benchmark for Tracking Any Object [95.87310116010185]
Tracking Any Object dataset consists of 2,907 high resolution videos, captured in diverse environments, which are half a minute long on average.
We ask annotators to label objects that move at any point in the video, and give names to them post factum.
Our vocabulary is both significantly larger and qualitatively different from existing tracking datasets.
arXiv Detail & Related papers (2020-05-20T21:07:28Z) - Rethinking Object Detection in Retail Stores [55.359582952686175]
We propose a new task, simultaneously object localization and counting, abbreviated as Locount.
Locount requires algorithms to localize groups of objects of interest with the number of instances.
We collect a large-scale object localization and counting dataset with rich annotations in retail stores.
arXiv Detail & Related papers (2020-03-18T14:01:54Z) - Counting dense objects in remote sensing images [52.182698295053264]
Estimating number of interested objects from a given image is a challenging yet important task.
In this paper, we are interested in counting dense objects from remote sensing images.
To address these issues, we first construct a large-scale object counting dataset based on remote sensing images.
We then benchmark the dataset by designing a novel neural network which can generate density map of an input image.
arXiv Detail & Related papers (2020-02-14T09:13:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.