SOOD-ImageNet: a Large-Scale Dataset for Semantic Out-Of-Distribution Image Classification and Semantic Segmentation
- URL: http://arxiv.org/abs/2409.01109v1
- Date: Mon, 2 Sep 2024 09:37:39 GMT
- Title: SOOD-ImageNet: a Large-Scale Dataset for Semantic Out-Of-Distribution Image Classification and Semantic Segmentation
- Authors: Alberto Bacchin, Davide Allegro, Stefano Ghidoni, Emanuele Menegatti,
- Abstract summary: Out-of-Distribution (OOD) detection in computer vision is a crucial research area.
SOOD-ImageNet is a novel dataset comprising around 1.6M images across 56 classes.
It is designed for common computer vision tasks such as image classification and semantic segmentation under OOD conditions.
- Score: 6.21476985578569
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Out-of-Distribution (OOD) detection in computer vision is a crucial research area, with related benchmarks playing a vital role in assessing the generalizability of models and their applicability in real-world scenarios. However, existing OOD benchmarks in the literature suffer from two main limitations: (1) they often overlook semantic shift as a potential challenge, and (2) their scale is limited compared to the large datasets used to train modern models. To address these gaps, we introduce SOOD-ImageNet, a novel dataset comprising around 1.6M images across 56 classes, designed for common computer vision tasks such as image classification and semantic segmentation under OOD conditions, with a particular focus on the issue of semantic shift. We ensured the necessary scalability and quality by developing an innovative data engine that leverages the capabilities of modern vision-language models, complemented by accurate human checks. Through extensive training and evaluation of various models on SOOD-ImageNet, we showcase its potential to significantly advance OOD research in computer vision. The project page is available at https://github.com/bach05/SOODImageNet.git.
Related papers
- KITTEN: A Knowledge-Intensive Evaluation of Image Generation on Visual Entities [93.74881034001312]
We conduct a systematic study on the fidelity of entities in text-to-image generation models.
We focus on their ability to generate a wide range of real-world visual entities, such as landmark buildings, aircraft, plants, and animals.
Our findings reveal that even the most advanced text-to-image models often fail to generate entities with accurate visual details.
arXiv Detail & Related papers (2024-10-15T17:50:37Z) - In Search of Forgotten Domain Generalization [20.26519807919284]
Out-of-Domain (OOD) generalization is the ability of a model trained on one or more domains to generalize to unseen domains.
In the ImageNet era of computer vision, evaluation sets for measuring a model's OOD performance were designed to be strictly OOD with respect to style.
The emergence of foundation models and expansive web-scale datasets has obfuscated this evaluation process.
arXiv Detail & Related papers (2024-10-10T17:50:45Z) - Effectiveness of Vision Language Models for Open-world Single Image Test Time Adaptation [15.621092104244003]
We propose a novel framework to address the real-world challenging task of Single Image Test Time Adaptation.
We leverage large scale Vision Language Models like CLIP to enable real time adaptation on a per-image basis.
The proposed framework ROSITA combines these components, enabling continuous online adaptation of Vision Language Models.
arXiv Detail & Related papers (2024-06-01T16:21:42Z) - BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation [57.40024206484446]
We introduce the BEHAVIOR Vision Suite (BVS), a set of tools and assets to generate fully customized synthetic data for systematic evaluation of computer vision models.
BVS supports a large number of adjustable parameters at the scene level.
We showcase three example application scenarios.
arXiv Detail & Related papers (2024-05-15T17:57:56Z) - Multi-Modal Prompt Learning on Blind Image Quality Assessment [65.0676908930946]
Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly.
Traditional methods, hindered by a lack of sufficiently annotated data, have employed the CLIP image-text pretraining model as their backbone to gain semantic awareness.
Recent approaches have attempted to address this mismatch using prompt technology, but these solutions have shortcomings.
This paper introduces an innovative multi-modal prompt-based methodology for IQA.
arXiv Detail & Related papers (2024-04-23T11:45:32Z) - Sequential Modeling Enables Scalable Learning for Large Vision Models [120.91839619284431]
We introduce a novel sequential modeling approach which enables learning a Large Vision Model (LVM) without making use of any linguistic data.
We define a common format, "visual sentences", in which we can represent raw images and videos as well as annotated data sources.
arXiv Detail & Related papers (2023-12-01T18:59:57Z) - Adaptive Contextual Perception: How to Generalize to New Backgrounds and
Ambiguous Objects [75.15563723169234]
We investigate how vision models adaptively use context for out-of-distribution generalization.
We show that models that excel in one setting tend to struggle in the other.
To replicate the generalization abilities of biological vision, computer vision models must have factorized object vs. background representations.
arXiv Detail & Related papers (2023-06-09T15:29:54Z) - Perceptual Grouping in Contrastive Vision-Language Models [59.1542019031645]
We show how vision-language models are able to understand where objects reside within an image and group together visually related parts of the imagery.
We propose a minimal set of modifications that results in models that uniquely learn both semantic and spatial information.
arXiv Detail & Related papers (2022-10-18T17:01:35Z) - High-resolution semantically-consistent image-to-image translation [0.0]
This paper proposes an unsupervised domain adaptation model that preserves semantic consistency and per-pixel quality for the images during the style-transferring phase.
The proposed model shows substantial performance gain compared to the SemI2I model and reaches similar results as the state-of-the-art CyCADA model.
arXiv Detail & Related papers (2022-09-13T19:08:30Z) - Benchmarking the Robustness of Instance Segmentation Models [3.1287804585804073]
This paper presents a comprehensive evaluation of instance segmentation models with respect to real-world image corruptions and out-of-domain image collections.
The out-of-domain image evaluation shows the generalization capability of models, an essential aspect of real-world applications.
Specifically, this benchmark study includes state-of-the-art network architectures, network backbones, normalization layers, models trained starting from scratch or ImageNet pretrained networks.
arXiv Detail & Related papers (2021-09-02T17:50:07Z) - Deep Learning of Unified Region, Edge, and Contour Models for Automated
Image Segmentation [2.0305676256390934]
convolutional neural networks (CNNs) have gained traction in the design of automated segmentation pipelines.
CNN-based models are adept at learning abstract features from raw image data, but their performance is dependent on the availability and size of suitable training datasets.
In this thesis, we devise novel methodologies that address these issues and establish robust representation learning frameworks for fully-automatic semantic segmentation.
arXiv Detail & Related papers (2020-06-23T02:54:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.