Open-vocabulary Attribute Detection
- URL: http://arxiv.org/abs/2211.12914v1
- Date: Wed, 23 Nov 2022 12:34:43 GMT
- Title: Open-vocabulary Attribute Detection
- Authors: Mar\'ia A. Bravo, Sudhanshu Mittal, Simon Ging, Thomas Brox
- Abstract summary: This paper introduces the Open-Vocabulary Attribute Detection task and the corresponding OVAD benchmark.
The objective of the novel task and benchmark is to probe object-level attribute information learned by vision-language models.
Overall, the benchmark consists of 1.4 million annotations.
- Score: 38.5017012867974
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Vision-language modeling has enabled open-vocabulary tasks where predictions
can be queried using any text prompt in a zero-shot manner. Existing
open-vocabulary tasks focus on object classes, whereas research on object
attributes is limited due to the lack of a reliable attribute-focused
evaluation benchmark. This paper introduces the Open-Vocabulary Attribute
Detection (OVAD) task and the corresponding OVAD benchmark. The objective of
the novel task and benchmark is to probe object-level attribute information
learned by vision-language models. To this end, we created a clean and densely
annotated test set covering 117 attribute classes on the 80 object classes of
MS COCO. It includes positive and negative annotations, which enables
open-vocabulary evaluation. Overall, the benchmark consists of 1.4 million
annotations. For reference, we provide a first baseline method for
open-vocabulary attribute detection. Moreover, we demonstrate the benchmark's
value by studying the attribute detection performance of several foundation
models. Project page https://ovad-benchmark.github.io/
Related papers
- CASA: Class-Agnostic Shared Attributes in Vision-Language Models for Efficient Incremental Object Detection [30.46562066023117]
We propose a novel method utilizing attributes in vision-language foundation models for incremental object detection.
Our method constructs a Class-Agnostic Shared Attribute base (CASA) to capture common semantic information among incremental classes.
Our method adds only 0.7% to parameter storage through parameter-efficient fine-tuning to significantly enhance the scalability and adaptability of our proposed method.
arXiv Detail & Related papers (2024-10-08T08:36:12Z) - Generative Region-Language Pretraining for Open-Ended Object Detection [55.42484781608621]
We propose a framework named GenerateU, which can detect dense objects and generate their names in a free-form way.
Our framework achieves comparable results to the open-vocabulary object detection method GLIP.
arXiv Detail & Related papers (2024-03-15T10:52:39Z) - Exploiting Contextual Target Attributes for Target Sentiment
Classification [53.30511968323911]
Existing PTLM-based models for TSC can be categorized into two groups: 1) fine-tuning-based models that adopt PTLM as the context encoder; 2) prompting-based models that transfer the classification task to the text/word generation task.
We present a new perspective of leveraging PTLM for TSC: simultaneously leveraging the merits of both language modeling and explicit target-context interactions via contextual target attributes.
arXiv Detail & Related papers (2023-12-21T11:45:28Z) - The devil is in the fine-grained details: Evaluating open-vocabulary object detectors for fine-grained understanding [8.448399308205266]
We introduce an evaluation protocol based on dynamic vocabulary generation to test whether models detect, discern, and assign the correct fine-grained description to objects.
We further enhance our investigation by evaluating several state-of-the-art open-vocabulary object detectors using the proposed protocol.
arXiv Detail & Related papers (2023-11-29T10:40:52Z) - Investigating the Role of Attribute Context in Vision-Language Models
for Object Recognition and Detection [33.77415850289717]
Methods are mostly evaluated in terms of how well object class names are learned, but captions also contain rich attribute context.
It is unclear how methods use this context in learning, as well as whether models succeed when tasks require attribute and object understanding.
Our results show that attribute context can be wasted when learning alignment for detection, attribute meaning is not adequately considered in embeddings, and describing classes by only their attributes is ineffective.
arXiv Detail & Related papers (2023-03-17T16:14:37Z) - OvarNet: Towards Open-vocabulary Object Attribute Recognition [42.90477523238336]
We propose a naive two-stage approach for open-vocabulary object detection and attribute classification, termed CLIP-Attr.
The candidate objects are first proposed with an offline RPN and later classified for semantic category and attributes.
We show that recognition of semantic category and attributes is complementary for visual scene understanding.
arXiv Detail & Related papers (2023-01-23T15:59:29Z) - Selective Annotation Makes Language Models Better Few-Shot Learners [97.07544941620367]
Large language models can perform in-context learning, where they learn a new task from a few task demonstrations.
This work examines the implications of in-context learning for the creation of datasets for new natural language tasks.
We propose an unsupervised, graph-based selective annotation method, voke-k, to select diverse, representative examples to annotate.
arXiv Detail & Related papers (2022-09-05T14:01:15Z) - Make an Omelette with Breaking Eggs: Zero-Shot Learning for Novel
Attribute Synthesis [65.74825840440504]
We propose Zero Shot Learning for Attributes (ZSLA), which is the first of its kind to the best of our knowledge.
Our proposed method is able to synthesize the detectors of novel attributes in a zero-shot learning manner.
With using only 32 seen attributes on the Caltech-UCSD Birds-200-2011 dataset, our proposed method is able to synthesize other 207 novel attributes.
arXiv Detail & Related papers (2021-11-28T15:45:54Z) - CompGuessWhat?!: A Multi-task Evaluation Framework for Grounded Language
Learning [78.3857991931479]
We present GROLLA, an evaluation framework for Grounded Language Learning with Attributes.
We also propose a new dataset CompGuessWhat?! as an instance of this framework for evaluating the quality of learned neural representations.
arXiv Detail & Related papers (2020-06-03T11:21:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.