Related papers: OS2D: One-Stage One-Shot Object Detection by Matching Anchor Features

OS2D: One-Stage One-Shot Object Detection by Matching Anchor Features

URL: http://arxiv.org/abs/2003.06800v2
Date: Wed, 19 Aug 2020 13:59:00 GMT
Title: OS2D: One-Stage One-Shot Object Detection by Matching Anchor Features
Authors: Anton Osokin, Denis Sumin, Vasily Lomakin
Abstract summary: One-shot object detection consists in detecting objects defined by a single demonstration. We build the one-stage system that performs localization and recognition jointly. Experimental evaluation on several challenging domains shows that our method can detect unseen classes.
Score: 14.115782214599015
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we consider the task of one-shot object detection, which consists in detecting objects defined by a single demonstration. Differently from the standard object detection, the classes of objects used for training and testing do not overlap. We build the one-stage system that performs localization and recognition jointly. We use dense correlation matching of learned local features to find correspondences, a feed-forward geometric transformation model to align features and bilinear resampling of the correlation tensor to compute the detection score of the aligned features. All the components are differentiable, which allows end-to-end training. Experimental evaluation on several challenging domains (retail products, 3D objects, buildings and logos) shows that our method can detect unseen classes (e.g., toothpaste when trained on groceries) and outperforms several baselines by a significant margin. Our code is available online: https://github.com/aosokin/os2d .

Related papers

Detection Based Part-level Articulated Object Reconstruction from Single RGBD Image [52.11275397911693]
We propose an end-to-end trainable, cross-category method for reconstructing multiple man-made articulated objects from a single RGBD image. We depart from previous works that rely on learning instance-level latent space, focusing on man-made articulated objects with predefined part counts. Our method successfully reconstructs variously structured multiple instances that previous works cannot handle, and outperforms prior works in shape reconstruction and kinematics estimation.
arXiv Detail & Related papers (2025-04-04T05:08:04Z)
DenseMatcher: Learning 3D Semantic Correspondence for Category-Level Manipulation from a Single Demo [22.659984212937907]
We present DenseMatcher, a method capable of computing 3D correspondences between in-the-wild objects that share similar structures. DenseMatcher significantly outperforms prior 3D matching baselines by 43.5%.
arXiv Detail & Related papers (2024-12-06T18:55:09Z)
PatchContrast: Self-Supervised Pre-training for 3D Object Detection [14.603858163158625]
We introduce PatchContrast, a novel self-supervised point cloud pre-training framework for 3D object detection. We show that our method outperforms existing state-of-the-art models on three commonly-used 3D detection datasets.
arXiv Detail & Related papers (2023-08-14T07:45:54Z)
On Hyperbolic Embeddings in 2D Object Detection [76.12912000278322]
We study whether a hyperbolic geometry better matches the underlying structure of the object classification space. We incorporate a hyperbolic classifier in two-stage, keypoint-based, and transformer-based object detection architectures. We observe categorical class hierarchies emerging in the structure of the classification space, resulting in lower classification errors and boosting the overall object detection performance.
arXiv Detail & Related papers (2022-03-15T16:43:40Z)
Contrastive Object Detection Using Knowledge Graph Embeddings [72.17159795485915]
We compare the error statistics of the class embeddings learned from a one-hot approach with semantically structured embeddings from natural language processing or knowledge graphs. We propose a knowledge-embedded design for keypoint-based and transformer-based object detection architectures.
arXiv Detail & Related papers (2021-12-21T17:10:21Z)
Multi-patch Feature Pyramid Network for Weakly Supervised Object Detection in Optical Remote Sensing Images [39.25541709228373]
We propose a new architecture for object detection with a multiple patch feature pyramid network (MPFP-Net) MPFP-Net is different from the current models that during training only pursue the most discriminative patches. We introduce an effective method to regularize the residual values and make the fusion transition layers strictly norm-preserving.
arXiv Detail & Related papers (2021-08-18T09:25:39Z)
Pretrained equivariant features improve unsupervised landmark discovery [69.02115180674885]
We formulate a two-step unsupervised approach that overcomes this challenge by first learning powerful pixel-based features. Our method produces state-of-the-art results in several challenging landmark detection datasets.
arXiv Detail & Related papers (2021-04-07T05:42:11Z)
Modulating Localization and Classification for Harmonized Object Detection [40.82723262074911]
We propose a mutual learning framework to modulate the two tasks. In particular, the two tasks are forced to learn from each other with a novel mutual labeling strategy. We achieve a significant performance gain over the baseline detectors on the COCO dataset.
arXiv Detail & Related papers (2021-03-16T10:36:02Z)
Unsupervised Part Discovery via Feature Alignment [15.67978793872039]
We exploit the property that neural network features are largely invariant to nuisance variables. We find a set of similar images that show instances of the same object category in the same pose, through an affine alignment of their corresponding feature maps. During inference, part detection is simple and fast, without any extra modules or overheads other than a feed-forward neural network.
arXiv Detail & Related papers (2020-12-01T07:25:00Z)
Synthesizing the Unseen for Zero-shot Object Detection [72.38031440014463]
We propose to synthesize visual features for unseen classes, so that the model learns both seen and unseen objects in the visual domain. We use a novel generative model that uses class-semantics to not only generate the features but also to discriminatively separate them.
arXiv Detail & Related papers (2020-10-19T12:36:11Z)
Multi-scale Interactive Network for Salient Object Detection [91.43066633305662]
We propose the aggregate interaction modules to integrate the features from adjacent levels. To obtain more efficient multi-scale features, the self-interaction modules are embedded in each decoder unit. Experimental results on five benchmark datasets demonstrate that the proposed method without any post-processing performs favorably against 23 state-of-the-art approaches.
arXiv Detail & Related papers (2020-07-17T15:41:37Z)
Attention-based Joint Detection of Object and Semantic Part [4.389917490809522]
Our model is created on top of two Faster-RCNN models that share their features to get enhanced representations of both. Experiments on the PASCAL-Part 2010 dataset show that joint detection can simultaneously improve both object detection and part detection.
arXiv Detail & Related papers (2020-07-05T18:54:10Z)
One-Shot Object Detection without Fine-Tuning [62.39210447209698]
We introduce a two-stage model consisting of a first stage Matching-FCOS network and a second stage Structure-Aware Relation Module. We also propose novel training strategies that effectively improve detection performance. Our method exceeds the state-of-the-art one-shot performance consistently on multiple datasets.
arXiv Detail & Related papers (2020-05-08T01:59:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.