Evaluating the Performance of Open-Vocabulary Object Detection in Low-quality Image
- URL: http://arxiv.org/abs/2512.22801v2
- Date: Fri, 02 Jan 2026 12:29:39 GMT
- Title: Evaluating the Performance of Open-Vocabulary Object Detection in Low-quality Image
- Authors: Po-Chih Wu,
- Abstract summary: We introduce a new dataset that simulates low-quality images in the real world.<n>We find that although open-vocabulary object detection models exhibited no significant decrease in mAP scores under low-level image degradation, the performance of all models dropped sharply under high-level image degradation.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Open-vocabulary object detection enables models to localize and recognize objects beyond a predefined set of categories and is expected to achieve recognition capabilities comparable to human performance. In this study, we aim to evaluate the performance of existing models on open-vocabulary object detection tasks under low-quality image conditions. For this purpose, we introduce a new dataset that simulates low-quality images in the real world. In our evaluation experiment, we find that although open-vocabulary object detection models exhibited no significant decrease in mAP scores under low-level image degradation, the performance of all models dropped sharply under high-level image degradation. OWLv2 models consistently performed better across different types of degradation, while OWL-ViT, GroundingDINO, and Detic showed significant performance declines. We will release our dataset and codes to facilitate future studies.
Related papers
- Human Body Restoration with One-Step Diffusion Model and A New Benchmark [74.66514054623669]
We propose a high-quality dataset automated cropping and filtering (HQ-ACF) pipeline.<n>This pipeline leverages existing object detection datasets and other unlabeled images to automatically crop and filter high-quality human images.<n>We also propose emphOSDHuman, a novel one-step diffusion model for human body restoration.
arXiv Detail & Related papers (2025-02-03T14:48:40Z) - Explorations in Self-Supervised Learning: Dataset Composition Testing for Object Classification [0.0]
We investigate the impact of sampling and pretraining using datasets with different image characteristics on the performance of self-supervised learning (SSL) models for object classification.<n>We find that depth pretrained models are more effective on low resolution images, while RGB pretrained models perform better on higher resolution images.
arXiv Detail & Related papers (2024-12-01T11:21:01Z) - Understanding and Improving Training-Free AI-Generated Image Detections with Vision Foundation Models [68.90917438865078]
Deepfake techniques for facial synthesis and editing pose serious risks for generative models.<n>In this paper, we investigate how detection performance varies across model backbones, types, and datasets.<n>We introduce Contrastive Blur, which enhances performance on facial images, and MINDER, which addresses noise type bias, balancing performance across domains.
arXiv Detail & Related papers (2024-11-28T13:04:45Z) - Few-shot target-driven instance detection based on open-vocabulary object detection models [1.0749601922718608]
Open-vocabulary object detection models bring closer visual and textual concepts in the same latent space.
We propose a lightweight method to turn the latter into a one-shot or few-shot object recognition models without requiring textual descriptions.
arXiv Detail & Related papers (2024-10-21T14:03:15Z) - Innovative Horizons in Aerial Imagery: LSKNet Meets DiffusionDet for
Advanced Object Detection [55.2480439325792]
We present an in-depth evaluation of an object detection model that integrates the LSKNet backbone with the DiffusionDet head.
The proposed model achieves a mean average precision (MAP) of approximately 45.7%, which is a significant improvement.
This advancement underscores the effectiveness of the proposed modifications and sets a new benchmark in aerial image analysis.
arXiv Detail & Related papers (2023-11-21T19:49:13Z) - Ambiguous Images With Human Judgments for Robust Visual Event
Classification [34.62731821199598]
We create datasets of ambiguous images and use them to produce SQUID-E ("Squidy"), a collection of noisy images extracted from videos.
All images are annotated with ground truth values and a test set is annotated with human uncertainty judgments.
We use this dataset to characterize human uncertainty in vision tasks and evaluate existing visual event classification models.
arXiv Detail & Related papers (2022-10-06T17:52:20Z) - Exploring Resolution and Degradation Clues as Self-supervised Signal for
Low Quality Object Detection [77.3530907443279]
We propose a novel self-supervised framework to detect objects in degraded low resolution images.
Our methods has achieved superior performance compared with existing methods when facing variant degradation situations.
arXiv Detail & Related papers (2022-08-05T09:36:13Z) - NOD: Taking a Closer Look at Detection under Extreme Low-Light
Conditions with Night Object Detection Dataset [25.29013780731876]
Low light proves more difficult for machine cognition than previously thought.
We present a large-scale dataset showing dynamic scenes captured on the streets at night.
We propose to incorporate an image enhancement module into the object detection framework and two novel data augmentation techniques.
arXiv Detail & Related papers (2021-10-20T03:44:04Z) - Detection and Captioning with Unseen Object Classes [12.894104422808242]
Test images may contain visual objects with no corresponding visual or textual training examples.
We propose a detection-driven approach based on a generalized zero-shot detection model and a template-based sentence generation model.
Our experiments show that the proposed zero-shot detection model obtains state-of-the-art performance on the MS-COCO dataset.
arXiv Detail & Related papers (2021-08-13T10:43:20Z) - Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets.
This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets.
We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z) - Contemplating real-world object classification [53.10151901863263]
We reanalyze the ObjectNet dataset recently proposed by Barbu et al. containing objects in daily life situations.
We find that applying deep models to the isolated objects, rather than the entire scene as is done in the original paper, results in around 20-30% performance improvement.
arXiv Detail & Related papers (2021-03-08T23:29:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.