Universal Object Detection with Large Vision Model
- URL: http://arxiv.org/abs/2212.09408v3
- Date: Thu, 12 Oct 2023 07:55:38 GMT
- Title: Universal Object Detection with Large Vision Model
- Authors: Feng Lin, Wenze Hu, Yaowei Wang, Yonghong Tian, Guangming Lu, Fanglin
Chen, Yong Xu, Xiaoyu Wang
- Abstract summary: This study focuses on the large-scale, multi-domain universal object detection problem.
To address these challenges, we introduce our approach to label handling, hierarchy-aware design, and resource-efficient model training.
Our method has demonstrated remarkable performance, securing a prestigious second-place ranking in the object detection track of the Robust Vision Challenge 2022.
- Score: 79.06618136217142
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Over the past few years, there has been growing interest in developing a
broad, universal, and general-purpose computer vision system. Such systems have
the potential to address a wide range of vision tasks simultaneously, without
being limited to specific problems or data domains. This universality is
crucial for practical, real-world computer vision applications. In this study,
our focus is on a specific challenge: the large-scale, multi-domain universal
object detection problem, which contributes to the broader goal of achieving a
universal vision system. This problem presents several intricate challenges,
including cross-dataset category label duplication, label conflicts, and the
necessity to handle hierarchical taxonomies. To address these challenges, we
introduce our approach to label handling, hierarchy-aware loss design, and
resource-efficient model training utilizing a pre-trained large vision model.
Our method has demonstrated remarkable performance, securing a prestigious
second-place ranking in the object detection track of the Robust Vision
Challenge 2022 (RVC 2022) on a million-scale cross-dataset object detection
benchmark. We believe that our comprehensive study will serve as a valuable
reference and offer an alternative approach for addressing similar challenges
within the computer vision community. The source code for our work is openly
available at https://github.com/linfeng93/Large-UniDet.
Related papers
- Benchmarking Robust Self-Supervised Learning Across Diverse Downstream Tasks [9.207022068713867]
We present a comprehensive empirical evaluation of the adversarial robustness of self-supervised vision encoders across multiple downstream tasks.
Our attacks operate in the encoder embedding space and at the downstream task output level.
Since the purpose of a foundation model is to cater to multiple applications at once, our findings reveal the need to enhance encoder robustness more broadly.
arXiv Detail & Related papers (2024-07-17T14:12:34Z) - V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results [142.5704093410454]
The V3Det Challenge 2024 aims to push the boundaries of object detection research.
The challenge consists of two tracks: Vast Vocabulary Object Detection and Open Vocabulary Object Detection.
We aim to inspire future research directions in vast vocabulary and open-vocabulary object detection.
arXiv Detail & Related papers (2024-06-17T16:58:51Z) - Learning 1D Causal Visual Representation with De-focus Attention Networks [108.72931590504406]
This paper explores the feasibility of representing images using 1D causal modeling.
We propose De-focus Attention Networks, which employ learnable bandpass filters to create varied attention patterns.
arXiv Detail & Related papers (2024-06-06T17:59:56Z) - Challenges for Monocular 6D Object Pose Estimation in Robotics [12.037567673872662]
We provide a unified view on recent publications from both robotics and computer vision.
We find that occlusion handling, novel pose representations, and formalizing and improving category-level pose estimation are still fundamental challenges.
In order to address them, ontological reasoning, deformability handling, scene-level reasoning, realistic datasets, and the ecological footprint of algorithms need to be improved.
arXiv Detail & Related papers (2023-07-22T21:36:57Z) - Open Challenges for Monocular Single-shot 6D Object Pose Estimation [15.01623452269803]
Object pose estimation is a non-trivial task that enables robotic manipulation, bin picking, augmented reality, and scene understanding.
Monocular object pose estimation gained considerable momentum with the rise of high-performing deep learning-based solutions.
We identify promising research directions in order to help researchers to formulate relevant research ideas and effectively advance the state of the art.
arXiv Detail & Related papers (2023-02-23T07:26:50Z) - MetaGraspNet: A Large-Scale Benchmark Dataset for Scene-Aware
Ambidextrous Bin Picking via Physics-based Metaverse Synthesis [72.85526892440251]
We introduce MetaGraspNet, a large-scale photo-realistic bin picking dataset constructed via physics-based metaverse synthesis.
The proposed dataset contains 217k RGBD images across 82 different article types, with full annotations for object detection, amodal perception, keypoint detection, manipulation order and ambidextrous grasp labels for a parallel-jaw and vacuum gripper.
We also provide a real dataset consisting of over 2.3k fully annotated high-quality RGBD images, divided into 5 levels of difficulties and an unseen object set to evaluate different object and layout properties.
arXiv Detail & Related papers (2022-08-08T08:15:34Z) - Person Re-identification: A Retrospective on Domain Specific Open
Challenges and Future Trends [2.4907242954727926]
Person re-identification (Re-ID) is one of the primary components of an automated visual surveillance system.
It aims to automatically identify/search persons in a multi-camera network having non-overlapping field-of-views.
arXiv Detail & Related papers (2022-02-26T11:55:57Z) - Unsupervised Domain Adaption of Object Detectors: A Survey [87.08473838767235]
Recent advances in deep learning have led to the development of accurate and efficient models for various computer vision applications.
Learning highly accurate models relies on the availability of datasets with a large number of annotated images.
Due to this, model performance drops drastically when evaluated on label-scarce datasets having visually distinct images.
arXiv Detail & Related papers (2021-05-27T23:34:06Z) - Weakly Supervised Object Localization and Detection: A Survey [145.5041117184952]
weakly supervised object localization and detection plays an important role for developing new generation computer vision systems.
We review (1) classic models, (2) approaches with feature representations from off-the-shelf deep networks, (3) approaches solely based on deep learning, and (4) publicly available datasets and standard evaluation metrics that are widely used in this field.
We discuss the key challenges in this field, development history of this field, advantages/disadvantages of the methods in each category, relationships between methods in different categories, applications of the weakly supervised object localization and detection methods, and potential future directions to further promote the development of this research field
arXiv Detail & Related papers (2021-04-16T06:44:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.