Enhanced Object Detection: A Study on Vast Vocabulary Object Detection Track for V3Det Challenge 2024
- URL: http://arxiv.org/abs/2406.09201v3
- Date: Fri, 21 Jun 2024 08:15:12 GMT
- Title: Enhanced Object Detection: A Study on Vast Vocabulary Object Detection Track for V3Det Challenge 2024
- Authors: Peixi Wu, Bosong Chai, Xuan Nie, Longquan Yan, Zeyu Wang, Qifan Zhou, Boning Wang, Yansong Peng, Hebei Li,
- Abstract summary: We present our findings from the research conducted on the Vast Vocabulary Visual Detection dataset for Supervised Vast Vocabulary Visual Detection task.
Our model has shown improvement over the baseline and achieved excellent rankings on the Leaderboard for both the Vast Vocabulary Object Detection (Supervised) track and the Open Vocabulary Object Detection (OVD) track of the V3Det Challenge 2024.
- Score: 3.5043076887736198
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this technical report, we present our findings from the research conducted on the Vast Vocabulary Visual Detection (V3Det) dataset for Supervised Vast Vocabulary Visual Detection task. How to deal with complex categories and detection boxes has become a difficulty in this track. The original supervised detector is not suitable for this task. We have designed a series of improvements, including adjustments to the network structure, changes to the loss function, and design of training strategies. Our model has shown improvement over the baseline and achieved excellent rankings on the Leaderboard for both the Vast Vocabulary Object Detection (Supervised) track and the Open Vocabulary Object Detection (OVD) track of the V3Det Challenge 2024.
Related papers
- V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results [142.5704093410454]
The V3Det Challenge 2024 aims to push the boundaries of object detection research.
The challenge consists of two tracks: Vast Vocabulary Object Detection and Open Vocabulary Object Detection.
We aim to inspire future research directions in vast vocabulary and open-vocabulary object detection.
arXiv Detail & Related papers (2024-06-17T16:58:51Z) - Learning Background Prompts to Discover Implicit Knowledge for Open Vocabulary Object Detection [101.15777242546649]
Open vocabulary object detection (OVD) aims at seeking an optimal object detector capable of recognizing objects from both base and novel categories.
Recent advances leverage knowledge distillation to transfer insightful knowledge from pre-trained large-scale vision-language models to the task of object detection.
We present a novel OVD framework termed LBP to propose learning background prompts to harness explored implicit background knowledge.
arXiv Detail & Related papers (2024-06-01T17:32:26Z) - DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection [111.68263493302499]
We introduce DetCLIPv3, a high-performing detector that excels at both open-vocabulary object detection and hierarchical labels.
DetCLIPv3 is characterized by three core designs: 1) Versatile model architecture; 2) High information density data; and 3) Efficient training strategy.
DetCLIPv3 demonstrates superior open-vocabulary detection performance, outperforming GLIPv2, GroundingDINO, and DetCLIPv2 by 18.0/19.6/6.6 AP, respectively.
arXiv Detail & Related papers (2024-04-14T11:01:44Z) - Box-based Refinement for Weakly Supervised and Unsupervised Localization
Tasks [57.70351255180495]
We train the detectors on top of the network output instead of the image data and apply suitable loss backpropagation.
Our findings reveal a significant improvement in phrase grounding for the what is where by looking'' task.
arXiv Detail & Related papers (2023-09-07T17:36:02Z) - MOTRv3: Release-Fetch Supervision for End-to-End Multi-Object Tracking [27.493264998858955]
We propose MOTRv3, which balances the label assignment process using the developed release-fetch supervision strategy.
Besides, another two strategies named pseudo label distillation and track group denoising are designed to further improve the supervision for detection and association.
arXiv Detail & Related papers (2023-05-23T17:40:13Z) - V3Det: Vast Vocabulary Visual Detection Dataset [69.50942928928052]
V3Det is a vast vocabulary visual detection dataset with precisely annotated bounding boxes on massive images.
By offering a vast exploration space, V3Det enables extensive benchmarks on both vast and open vocabulary object detection.
arXiv Detail & Related papers (2023-04-07T17:45:35Z) - Bridging Images and Videos: A Simple Learning Framework for Large
Vocabulary Video Object Detection [110.08925274049409]
We present a simple but effective learning framework that takes full advantage of all available training data to learn detection and tracking.
We show that consistent improvements of various large vocabulary trackers are capable, setting strong baseline results on the challenging TAO benchmarks.
arXiv Detail & Related papers (2022-12-20T10:33:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.