PACO: Parts and Attributes of Common Objects
- URL: http://arxiv.org/abs/2301.01795v1
- Date: Wed, 4 Jan 2023 19:28:03 GMT
- Title: PACO: Parts and Attributes of Common Objects
- Authors: Vignesh Ramanathan, Anmol Kalia, Vladan Petrovic, Yi Wen, Baixue
Zheng, Baishan Guo, Rui Wang, Aaron Marquez, Rama Kovvuri, Abhishek Kadian,
Amir Mousavi, Yiwen Song, Abhimanyu Dubey, Dhruv Mahajan
- Abstract summary: We introduce PACO: Parts and Attributes of Common Objects.
It spans 75 object categories, 456 object-part categories and 55 attributes across image (LVIS) and video (Ego4D) datasets.
We provide 641K part masks annotated across 260K object boxes, with roughly half of them exhaustively annotated with attributes as well.
- Score: 27.559972499989694
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Object models are gradually progressing from predicting just category labels
to providing detailed descriptions of object instances. This motivates the need
for large datasets which go beyond traditional object masks and provide richer
annotations such as part masks and attributes. Hence, we introduce PACO: Parts
and Attributes of Common Objects. It spans 75 object categories, 456
object-part categories and 55 attributes across image (LVIS) and video (Ego4D)
datasets. We provide 641K part masks annotated across 260K object boxes, with
roughly half of them exhaustively annotated with attributes as well. We design
evaluation metrics and provide benchmark results for three tasks on the
dataset: part mask segmentation, object and part attribute prediction and
zero-shot instance detection. Dataset, models, and code are open-sourced at
https://github.com/facebookresearch/paco.
Related papers
- An Attribute-Enriched Dataset and Auto-Annotated Pipeline for Open Detection [7.531866919805308]
We introduce the Objects365-Attr dataset, an extension of the existing Objects365 dataset, distinguished by its attribute annotations.
This dataset reduces inconsistencies in object detection by integrating a broad spectrum of attributes, including color, material, state, texture and tone.
It contains an extensive collection of 5.6M object-level attribute descriptions, meticulously annotated across 1.4M bounding boxes.
arXiv Detail & Related papers (2024-09-10T07:53:32Z) - Learning Spatial-Semantic Features for Robust Video Object Segmentation [108.045326229865]
We propose a robust video object segmentation framework that learns spatial-semantic features and discriminative object queries.
The proposed method achieves state-of-the-art performance on benchmark data sets, including the DAVIS 2017 test (textbf87.8%), YoutubeVOS 2019 (textbf88.1%), MOSE val (textbf74.0%), and LVOS test (textbf73.0%)
arXiv Detail & Related papers (2024-07-10T15:36:00Z) - MAC: A Benchmark for Multiple Attributes Compositional Zero-Shot Learning [33.12021227971062]
Compositional Zero-Shot Learning (CZSL) aims to learn semantic primitives (attributes and objects) from seen neglecting and recognize unseen attribute-object compositions.
We introduce the Multi-Attribute Composition dataset, encompassing 18,217 images and 11,067 compositions with comprehensive, representative, and diverse attribute annotations.
Our dataset supports deeper semantic understanding and higher-order attribute associations, providing a more realistic and challenging benchmark for the CZSL task.
arXiv Detail & Related papers (2024-06-18T16:24:48Z) - CustAny: Customizing Anything from A Single Example [73.90939022698399]
We present a novel pipeline to construct a large dataset of general objects, featuring 315k text-image samples across 10k categories.
With the help of MC-IDC, we introduce Customizing Anything (CustAny), a zero-shot framework that maintains ID fidelity and supports flexible text editing for general objects.
Our contributions include a large-scale dataset, the CustAny framework and novel ID processing to advance this field.
arXiv Detail & Related papers (2024-06-17T15:26:22Z) - 1st Place Solution for MOSE Track in CVPR 2024 PVUW Workshop: Complex Video Object Segmentation [72.54357831350762]
We propose a semantic embedding video object segmentation model and use the salient features of objects as query representations.
We trained our model on a large-scale video object segmentation dataset.
Our model achieves first place (textbf84.45%) in the test set of Complex Video Object Challenge.
arXiv Detail & Related papers (2024-06-07T03:13:46Z) - EgoObjects: A Large-Scale Egocentric Dataset for Fine-Grained Object
Understanding [11.9023437362986]
EgoObjects is a large-scale egocentric dataset for fine-grained object understanding.
Pilot version contains over 9K videos collected by 250 participants from 50+ countries using 4 wearable devices.
EgoObjects also annotates each object with an instance-level identifier.
arXiv Detail & Related papers (2023-09-15T23:55:43Z) - VizWiz-FewShot: Locating Objects in Images Taken by People With Visual
Impairments [74.72656607288185]
We introduce a few-shot localization dataset originating from photographers who authentically were trying to learn about the visual content in the images they took.
It includes nearly 10,000 segmentations of 100 categories in over 4,500 images that were taken by people with visual impairments.
Compared to existing few-shot object detection and instance segmentation datasets, our dataset is the first to locate holes in objects.
arXiv Detail & Related papers (2022-07-24T20:44:51Z) - Discovering Object Masks with Transformers for Unsupervised Semantic
Segmentation [75.00151934315967]
MaskDistill is a novel framework for unsupervised semantic segmentation.
Our framework does not latch onto low-level image cues and is not limited to object-centric datasets.
arXiv Detail & Related papers (2022-06-13T17:59:43Z) - Scaling up instance annotation via label propagation [69.8001043244044]
We propose a highly efficient annotation scheme for building large datasets with object segmentation masks.
We exploit these similarities by using hierarchical clustering on mask predictions made by a segmentation model.
We show that we obtain 1M object segmentation masks with a total annotation time of only 290 hours.
arXiv Detail & Related papers (2021-10-05T18:29:34Z) - Learning to Predict Visual Attributes in the Wild [43.91237738107603]
We introduce a large-scale in-the-wild visual attribute prediction dataset consisting of over 927K attribute annotations for over 260K object instances.
We propose several techniques that systematically tackle these challenges, including a base model that utilizes both low- and high-level CNN features.
Using these techniques, we achieve near 3.7 mAP and 5.7 overall F1 points improvement over the current state of the art.
arXiv Detail & Related papers (2021-06-17T17:58:02Z) - Generating Masks from Boxes by Mining Spatio-Temporal Consistencies in
Videos [159.02703673838639]
We introduce a method for generating segmentation masks from per-frame bounding box annotations in videos.
We use our resulting accurate masks for weakly supervised training of video object segmentation (VOS) networks.
The additional data provides substantially better generalization performance leading to state-of-the-art results in both the VOS and more challenging tracking domain.
arXiv Detail & Related papers (2021-01-06T18:56:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.