Open World Object Detection in the Era of Foundation Models
- URL: http://arxiv.org/abs/2312.05745v1
- Date: Sun, 10 Dec 2023 03:56:06 GMT
- Title: Open World Object Detection in the Era of Foundation Models
- Authors: Orr Zohar, Alejandro Lozano, Shelly Goel, Serena Yeung, Kuan-Chieh
Wang
- Abstract summary: We introduce a new benchmark that includes five real-world application-driven datasets.
We introduce a novel method, Foundation Object detection Model for the Open world, or FOMO, which identifies unknown objects based on their shared attributes with the base known objects.
- Score: 53.683963161370585
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Object detection is integral to a bevy of real-world applications, from
robotics to medical image analysis. To be used reliably in such applications,
models must be capable of handling unexpected - or novel - objects. The open
world object detection (OWD) paradigm addresses this challenge by enabling
models to detect unknown objects and learn discovered ones incrementally.
However, OWD method development is hindered due to the stringent benchmark and
task definitions. These definitions effectively prohibit foundation models.
Here, we aim to relax these definitions and investigate the utilization of
pre-trained foundation models in OWD. First, we show that existing benchmarks
are insufficient in evaluating methods that utilize foundation models, as even
naive integration methods nearly saturate these benchmarks. This result
motivated us to curate a new and challenging benchmark for these models.
Therefore, we introduce a new benchmark that includes five real-world
application-driven datasets, including challenging domains such as aerial and
surgical images, and establish baselines. We exploit the inherent connection
between classes in application-driven datasets and introduce a novel method,
Foundation Object detection Model for the Open world, or FOMO, which identifies
unknown objects based on their shared attributes with the base known objects.
FOMO has ~3x unknown object mAP compared to baselines on our benchmark.
However, our results indicate a significant place for improvement - suggesting
a great research opportunity in further scaling object detection methods to
real-world domains. Our code and benchmark are available at
https://orrzohar.github.io/projects/fomo/.
Related papers
- Zero-Shot Image Anomaly Detection Using Generative Foundation Models [2.241618130319058]
This research explores the use of score-based generative models as foundational tools for semantic anomaly detection.<n>By analyzing Stein score errors, we introduce a novel method for identifying anomalous samples without requiring re-training on each target dataset.<n>Our approach improves over state-of-the-art and relies on training a single model on one dataset -- CelebA -- which we find to be an effective base distribution.
arXiv Detail & Related papers (2025-07-30T13:56:36Z) - RoHOI: Robustness Benchmark for Human-Object Interaction Detection [38.09248570129455]
Human-Object Interaction (HOI) detection is crucial for robot-human assistance, enabling context-aware support.<n>We introduce the first benchmark for HOI detection, evaluating model resilience under diverse challenges.<n>Our benchmark, RoHOI, includes 20 corruption types based on HICO-DET and V-COCO datasets and a new robustness-focused metric.
arXiv Detail & Related papers (2025-07-12T01:58:04Z) - On the Robustness of Human-Object Interaction Detection against Distribution Shift [27.40641711088878]
Human-Object Interaction (HOI) detection has seen substantial advances in recent years.<n>Existing works focus on the standard setting with ideal images and natural distribution, far from practical scenarios with inevitable distribution shifts.<n>In this work, we investigate this issue by benchmarking, analyzing, and enhancing the robustness of HOI detection models under various distribution shifts.
arXiv Detail & Related papers (2025-06-22T13:01:34Z) - Oriented Tiny Object Detection: A Dataset, Benchmark, and Dynamic Unbiased Learning [51.170479006249195]
We introduce a new dataset, benchmark, and a dynamic coarse-to-fine learning scheme in this study.
Our proposed dataset, AI-TOD-R, features the smallest object sizes among all oriented object detection datasets.
We present a benchmark spanning a broad range of detection paradigms, including both fully-supervised and label-efficient approaches.
arXiv Detail & Related papers (2024-12-16T09:14:32Z) - Open-World Object Detection with Instance Representation Learning [1.8749305679160366]
We propose a method to train an object detector that can both detect novel objects and extract semantically rich features in open-world conditions.
Our method learns a robust and generalizable feature space, outperforming other OWOD-based feature extraction methods.
arXiv Detail & Related papers (2024-09-24T13:13:34Z) - Beyond Few-shot Object Detection: A Detailed Survey [25.465534270637523]
Researchers have introduced few-shot object detection (FSOD) approaches that merge few-shot learning and object detection principles.
These approaches play a vital role in reducing the reliance on extensive labeled datasets.
This survey paper aims to provide a comprehensive understanding of the above-mentioned few-shot settings and explore the methodologies for each FSOD task.
arXiv Detail & Related papers (2024-08-26T13:09:23Z) - Intelligence Analysis of Language Models [0.0]
We test the effectiveness of Large Language Models (LLMs) on the Abstraction and Reasoning Corpus (ARC) dataset.
This dataset serves as a representative benchmark for testing abstract reasoning abilities.
We investigate the application of the Chain-of-Thought (CoT) technique, aiming to determine its role in improving model performance.
arXiv Detail & Related papers (2024-07-20T13:48:16Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - Universal Domain Adaptation from Foundation Models: A Baseline Study [58.51162198585434]
We make empirical studies of state-of-the-art UniDA methods using foundation models.
We introduce textitCLIP distillation, a parameter-free method specifically designed to distill target knowledge from CLIP models.
Although simple, our method outperforms previous approaches in most benchmark tasks.
arXiv Detail & Related papers (2023-05-18T16:28:29Z) - Discovering Objects that Can Move [55.743225595012966]
We study the problem of object discovery -- separating objects from the background without manual labels.
Existing approaches utilize appearance cues, such as color, texture, and location, to group pixels into object-like regions.
We choose to focus on dynamic objects -- entities that can move independently in the world.
arXiv Detail & Related papers (2022-03-18T21:13:56Z) - Slender Object Detection: Diagnoses and Improvements [74.40792217534]
In this paper, we are concerned with the detection of a particular type of objects with extreme aspect ratios, namely textbfslender objects.
For a classical object detection method, a drastic drop of $18.9%$ mAP on COCO is observed, if solely evaluated on slender objects.
arXiv Detail & Related papers (2020-11-17T09:39:42Z) - Synthesizing the Unseen for Zero-shot Object Detection [72.38031440014463]
We propose to synthesize visual features for unseen classes, so that the model learns both seen and unseen objects in the visual domain.
We use a novel generative model that uses class-semantics to not only generate the features but also to discriminatively separate them.
arXiv Detail & Related papers (2020-10-19T12:36:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.