Related papers: Using Cross-Domain Detection Loss to Infer Multi-Scale Information for Improved Tiny Head Tracking

Using Cross-Domain Detection Loss to Infer Multi-Scale Information for Improved Tiny Head Tracking

URL: http://arxiv.org/abs/2505.22677v1
Date: Wed, 14 May 2025 02:29:54 GMT
Title: Using Cross-Domain Detection Loss to Infer Multi-Scale Information for Improved Tiny Head Tracking
Authors: Jisu Kim, Alex Mattingly, Eung-Joo Lee, Benjamin S. Riggan,
Abstract summary: We propose a framework to enhance tiny head detection and tracking by optimizing the balance between performance and efficiency.<n>Our framework integrates (1) a cross-domain detection loss, (2) a multi-scale module, and (3) a small receptive field detection mechanism.<n>These innovations enhance detection by bridging the gap between large and small detectors, capturing high-frequency details at multiple scales during training, and using filters with small receptive fields to detect tiny heads.
Score: 2.960887693377022
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Head detection and tracking are essential for downstream tasks, but current methods often require large computational budgets, which increase latencies and ties up resources (e.g., processors, memory, and bandwidth). To address this, we propose a framework to enhance tiny head detection and tracking by optimizing the balance between performance and efficiency. Our framework integrates (1) a cross-domain detection loss, (2) a multi-scale module, and (3) a small receptive field detection mechanism. These innovations enhance detection by bridging the gap between large and small detectors, capturing high-frequency details at multiple scales during training, and using filters with small receptive fields to detect tiny heads. Evaluations on the CroHD and CrowdHuman datasets show improved Multiple Object Tracking Accuracy (MOTA) and mean Average Precision (mAP), demonstrating the effectiveness of our approach in crowded scenes.

Related papers

Better Sampling, towards Better End-to-end Small Object Detection [7.7473020808686694]
Small object detection remains unsatisfactory due to limited characteristics and high density and mutual overlap. We propose methods enhancing sampling within an end-to-end framework. Our model demonstrates a significant enhancement, achieving a 2.9% increase in average precision (AP) over the state-of-the-art (SOTA) on the VisDrone dataset.
arXiv Detail & Related papers (2024-05-17T04:37:44Z)
Small Object Detection via Coarse-to-fine Proposal Generation and Imitation Learning [52.06176253457522]
We propose a two-stage framework tailored for small object detection based on the Coarse-to-fine pipeline and Feature Imitation learning. CFINet achieves state-of-the-art performance on the large-scale small object detection benchmarks, SODA-D and SODA-A.
arXiv Detail & Related papers (2023-08-18T13:13:09Z)
Tucker Bilinear Attention Network for Multi-scale Remote Sensing Object Detection [10.060030309684953]
Large-scale variation of remote-sensing targets is one of main challenges in VHR remote-sensing object detection. This paper proposes two novel modules: Guided Attention and Tucker Bilinear Attention. Based on two modules, we build a new multi-scale remote sensing object detection framework.
arXiv Detail & Related papers (2023-03-09T15:20:03Z)
Label-Efficient Object Detection via Region Proposal Network Pre-Training [58.50615557874024]
We propose a simple pretext task that provides an effective pre-training for the region proposal network (RPN) In comparison with multi-stage detectors without RPN pre-training, our approach is able to consistently improve downstream task performance.
arXiv Detail & Related papers (2022-11-16T16:28:18Z)
Rethinking the Detection Head Configuration for Traffic Object Detection [11.526701794026641]
We propose a lightweight traffic object detection network based on matching between detection head and object distribution. The proposed model achieves more competitive performance than other models on BDD100K dataset and our proposed ETFOD-v2 dataset.
arXiv Detail & Related papers (2022-10-08T02:23:57Z)
SALISA: Saliency-based Input Sampling for Efficient Video Object Detection [58.22508131162269]
We propose SALISA, a novel non-uniform SALiency-based Input SAmpling technique for video object detection. We show that SALISA significantly improves the detection of small objects.
arXiv Detail & Related papers (2022-04-05T17:59:51Z)
Activation to Saliency: Forming High-Quality Labels for Unsupervised Salient Object Detection [54.92703325989853]
We propose a two-stage Activation-to-Saliency (A2S) framework that effectively generates high-quality saliency cues. No human annotations are involved in our framework during the whole training process. Our framework reports significant performance compared with existing USOD methods.
arXiv Detail & Related papers (2021-12-07T11:54:06Z)
Regressive Domain Adaptation for Unsupervised Keypoint Detection [67.2950306888855]
Domain adaptation (DA) aims at transferring knowledge from a labeled source domain to an unlabeled target domain. We present a method of regressive domain adaptation (RegDA) for unsupervised keypoint detection. Our method brings large improvement by 8% to 11% in terms of PCK on different datasets.
arXiv Detail & Related papers (2021-03-10T16:45:22Z)
Anchor-free Small-scale Multispectral Pedestrian Detection [88.7497134369344]
We propose a method for effective and efficient multispectral fusion of the two modalities in an adapted single-stage anchor-free base architecture. We aim at learning pedestrian representations based on object center and scale rather than direct bounding box predictions. Results show our method's effectiveness in detecting small-scaled pedestrians.
arXiv Detail & Related papers (2020-08-19T13:13:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.