VizWiz-FewShot: Locating Objects in Images Taken by People With Visual
Impairments
- URL: http://arxiv.org/abs/2207.11810v1
- Date: Sun, 24 Jul 2022 20:44:51 GMT
- Title: VizWiz-FewShot: Locating Objects in Images Taken by People With Visual
Impairments
- Authors: Yu-Yun Tseng, Alexander Bell, and Danna Gurari
- Abstract summary: We introduce a few-shot localization dataset originating from photographers who authentically were trying to learn about the visual content in the images they took.
It includes nearly 10,000 segmentations of 100 categories in over 4,500 images that were taken by people with visual impairments.
Compared to existing few-shot object detection and instance segmentation datasets, our dataset is the first to locate holes in objects.
- Score: 74.72656607288185
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce a few-shot localization dataset originating from photographers
who authentically were trying to learn about the visual content in the images
they took. It includes nearly 10,000 segmentations of 100 categories in over
4,500 images that were taken by people with visual impairments. Compared to
existing few-shot object detection and instance segmentation datasets, our
dataset is the first to locate holes in objects (e.g., found in 12.3\% of our
segmentations), it shows objects that occupy a much larger range of sizes
relative to the images, and text is over five times more common in our objects
(e.g., found in 22.4\% of our segmentations). Analysis of three modern few-shot
localization algorithms demonstrates that they generalize poorly to our new
dataset. The algorithms commonly struggle to locate objects with holes, very
small and very large objects, and objects lacking text. To encourage a larger
community to work on these unsolved challenges, we publicly share our annotated
few-shot dataset at https://vizwiz.org .
Related papers
- Salient Object Detection for Images Taken by People With Vision
Impairments [13.157939981657886]
We introduce a new salient object detection dataset using images taken by people who are visually impaired.
VizWiz-SalientObject is the largest (i.e., 32,000 human-annotated images) and contains unique characteristics.
We benchmarked seven modern salient object detection methods on our dataset and found they struggle most with images featuring large, have less complex boundaries, and lack text.
arXiv Detail & Related papers (2023-01-12T22:33:01Z) - FewSOL: A Dataset for Few-Shot Object Learning in Robotic Environments [21.393674766169543]
We introduce the Few-Shot Object Learning dataset for object recognition with a few images per object.
We captured 336 real-world objects with 9 RGB-D images per object from different views.
The evaluation results show that there is still a large margin to be improved for few-shot object classification in robotic environments.
arXiv Detail & Related papers (2022-07-06T05:57:24Z) - ImageSubject: A Large-scale Dataset for Subject Detection [9.430492045581534]
Main subjects usually exist in the images or videos, as they are the objects that the photographer wants to highlight.
Detecting the main subjects is an important technique to help machines understand the content of images and videos.
We present a new dataset with the goal of training models to understand the layout of the objects then to find the main subjects among them.
arXiv Detail & Related papers (2022-01-09T22:49:59Z) - Contrastive Object Detection Using Knowledge Graph Embeddings [72.17159795485915]
We compare the error statistics of the class embeddings learned from a one-hot approach with semantically structured embeddings from natural language processing or knowledge graphs.
We propose a knowledge-embedded design for keypoint-based and transformer-based object detection architectures.
arXiv Detail & Related papers (2021-12-21T17:10:21Z) - Learning to Detect Every Thing in an Open World [139.78830329914135]
We propose a simple yet surprisingly powerful data augmentation and training scheme we call Learning to Detect Every Thing (LDET)
To avoid suppressing hidden objects, background objects that are visible but unlabeled, we paste annotated objects on a background image sampled from a small region of the original image.
LDET leads to significant improvements on many datasets in the open world instance segmentation task.
arXiv Detail & Related papers (2021-12-03T03:56:06Z) - PartImageNet: A Large, High-Quality Dataset of Parts [16.730418538593703]
We propose PartImageNet, a high-quality dataset with part segmentation annotations.
PartImageNet is unique because it offers part-level annotations on a general set of classes with non-rigid, articulated objects.
It can be utilized in multiple vision tasks including but not limited to: Part Discovery, Few-shot Learning.
arXiv Detail & Related papers (2021-12-02T02:12:03Z) - Few-Shot Learning for Video Object Detection in a Transfer-Learning
Scheme [70.45901040613015]
We study the new problem of few-shot learning for video object detection.
We employ a transfer-learning framework to effectively train the video object detector on a large number of base-class objects and a few video clips of novel-class objects.
arXiv Detail & Related papers (2021-03-26T20:37:55Z) - FAIR1M: A Benchmark Dataset for Fine-grained Object Recognition in
High-Resolution Remote Sensing Imagery [21.9319970004788]
We propose a novel benchmark dataset with more than 1 million instances and more than 15,000 images for Fine-grAined object recognItion in high-Resolution remote sensing imagery.
All objects in the FAIR1M dataset are annotated with respect to 5 categories and 37 sub-categories by oriented bounding boxes.
arXiv Detail & Related papers (2021-03-09T17:20:15Z) - A Simple and Effective Use of Object-Centric Images for Long-Tailed
Object Detection [56.82077636126353]
We take advantage of object-centric images to improve object detection in scene-centric images.
We present a simple yet surprisingly effective framework to do so.
Our approach can improve the object detection (and instance segmentation) accuracy of rare objects by 50% (and 33%) relatively.
arXiv Detail & Related papers (2021-02-17T17:27:21Z) - Learning Object Detection from Captions via Textual Scene Attributes [70.90708863394902]
We argue that captions contain much richer information about the image, including attributes of objects and their relations.
We present a method that uses the attributes in this "textual scene graph" to train object detectors.
We empirically demonstrate that the resulting model achieves state-of-the-art results on several challenging object detection datasets.
arXiv Detail & Related papers (2020-09-30T10:59:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.