FORB: A Flat Object Retrieval Benchmark for Universal Image Embedding
- URL: http://arxiv.org/abs/2309.16249v1
- Date: Thu, 28 Sep 2023 08:41:51 GMT
- Title: FORB: A Flat Object Retrieval Benchmark for Universal Image Embedding
- Authors: Pengxiang Wu, Siman Wang, Kevin Dela Rosa, Derek Hao Hu
- Abstract summary: We introduce a new dataset for benchmarking visual search methods on flat images with diverse patterns.
Our flat object retrieval benchmark (FORB) supplements the commonly adopted 3D object domain.
It serves as a testbed for assessing the image embedding quality on out-of-distribution domains.
- Score: 7.272083488859574
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image retrieval is a fundamental task in computer vision. Despite recent
advances in this field, many techniques have been evaluated on a limited number
of domains, with a small number of instance categories. Notably, most existing
works only consider domains like 3D landmarks, making it difficult to
generalize the conclusions made by these works to other domains, e.g., logo and
other 2D flat objects. To bridge this gap, we introduce a new dataset for
benchmarking visual search methods on flat images with diverse patterns. Our
flat object retrieval benchmark (FORB) supplements the commonly adopted 3D
object domain, and more importantly, it serves as a testbed for assessing the
image embedding quality on out-of-distribution domains. In this benchmark we
investigate the retrieval accuracy of representative methods in terms of
candidate ranks, as well as matching score margin, a viewpoint which is largely
ignored by many works. Our experiments not only highlight the challenges and
rich heterogeneity of FORB, but also reveal the hidden properties of different
retrieval strategies. The proposed benchmark is a growing project and we expect
to expand in both quantity and variety of objects. The dataset and supporting
codes are available at https://github.com/pxiangwu/FORB/.
Related papers
- Revisit Anything: Visual Place Recognition via Image Segment Retrieval [8.544326445217369]
Existing visual place recognition pipelines encode the "whole" image and search for matches.
We address this by encoding and searching for "image segments" instead of the whole images.
We show that retrieving these partial representations leads to significantly higher recognition recall than the typical whole image based retrieval.
arXiv Detail & Related papers (2024-09-26T16:49:58Z) - Are Local Features All You Need for Cross-Domain Visual Place
Recognition? [13.519413608607781]
Visual Place Recognition aims to predict the coordinates of an image based solely on visual clues.
Despite recent advances, recognizing the same place when the query comes from a significantly different distribution is still a major hurdle for state of the art retrieval methods.
In this work we explore whether re-ranking methods based on spatial verification can tackle these challenges.
arXiv Detail & Related papers (2023-04-12T14:46:57Z) - Unseen Object 6D Pose Estimation: A Benchmark and Baselines [62.8809734237213]
We propose a new task that enables and facilitates algorithms to estimate the 6D pose estimation of novel objects during testing.
We collect a dataset with both real and synthetic images and up to 48 unseen objects in the test set.
By training an end-to-end 3D correspondences network, our method finds corresponding points between an unseen object and a partial view RGBD image accurately and efficiently.
arXiv Detail & Related papers (2022-06-23T16:29:53Z) - Unsupervised Part Discovery from Contrastive Reconstruction [90.88501867321573]
The goal of self-supervised visual representation learning is to learn strong, transferable image representations.
We propose an unsupervised approach to object part discovery and segmentation.
Our method yields semantic parts consistent across fine-grained but visually distinct categories.
arXiv Detail & Related papers (2021-11-11T17:59:42Z) - Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets.
This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets.
We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z) - Tasks Integrated Networks: Joint Detection and Retrieval for Image
Search [99.49021025124405]
In many real-world searching scenarios (e.g., video surveillance), the objects are seldom accurately detected or annotated.
We first introduce an end-to-end Integrated Net (I-Net), which has three merits.
We further propose an improved I-Net, called DC-I-Net, which makes two new contributions.
arXiv Detail & Related papers (2020-09-03T03:57:50Z) - Few-Shot Object Detection and Viewpoint Estimation for Objects in the
Wild [40.132988301147776]
We tackle the problems of few-shot object detection and few-shot viewpoint estimation.
We demonstrate on both tasks the benefits of guiding the network prediction with class-representative features extracted from data.
Our method outperforms state-of-the-art methods by a large margin on a range of datasets.
arXiv Detail & Related papers (2020-07-23T16:17:25Z) - A Universal Representation Transformer Layer for Few-Shot Image
Classification [43.31379752656756]
Few-shot classification aims to recognize unseen classes when presented with only a small number of samples.
We consider the problem of multi-domain few-shot image classification, where unseen classes and examples come from diverse data sources.
Here, we propose a Universal Representation Transformer layer, that meta-learns to leverage universal features for few-shot classification.
arXiv Detail & Related papers (2020-06-21T03:08:00Z) - Extending and Analyzing Self-Supervised Learning Across Domains [50.13326427158233]
Self-supervised representation learning has achieved impressive results in recent years.
Experiments primarily come on ImageNet or other similarly large internet imagery datasets.
We experiment with several popular methods on an unprecedented variety of domains.
arXiv Detail & Related papers (2020-04-24T21:18:02Z) - Google Landmarks Dataset v2 -- A Large-Scale Benchmark for
Instance-Level Recognition and Retrieval [9.922132565411664]
We introduce the Google Landmarks dataset v2 (GLDv2), a new benchmark for large-scale, fine-grained instance recognition and image retrieval.
GLDv2 is the largest such dataset to date by a large margin, including over 5M images and 200k distinct instance labels.
The dataset is sourced from Wikimedia Commons, the world's largest crowdsourced collection of landmark photos.
arXiv Detail & Related papers (2020-04-03T22:52:17Z) - Cross-Domain Document Object Detection: Benchmark Suite and Method [71.4339949510586]
Document object detection (DOD) is fundamental for downstream tasks like intelligent document editing and understanding.
We investigate cross-domain DOD, where the goal is to learn a detector for the target domain using labeled data from the source domain and only unlabeled data from the target domain.
For each dataset, we provide the page images, bounding box annotations, PDF files, and the rendering layers extracted from the PDF files.
arXiv Detail & Related papers (2020-03-30T03:04:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.