Toward unsupervised, multi-object discovery in large-scale image
collections
- URL: http://arxiv.org/abs/2007.02662v2
- Date: Tue, 25 Aug 2020 11:11:31 GMT
- Title: Toward unsupervised, multi-object discovery in large-scale image
collections
- Authors: Huy V. Vo, Patrick P\'erez and Jean Ponce
- Abstract summary: This paper builds on the optimization approach of Vo et al.
We propose a novel saliency-based region proposal algorithm.
We exploit the inherent hierarchical structure of proposals as an effective regularizer.
- Score: 26.39475298878971
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper addresses the problem of discovering the objects present in a
collection of images without any supervision. We build on the optimization
approach of Vo et al. (CVPR'19) with several key novelties: (1) We propose a
novel saliency-based region proposal algorithm that achieves significantly
higher overlap with ground-truth objects than other competitive methods. This
procedure leverages off-the-shelf CNN features trained on classification tasks
without any bounding box information, but is otherwise unsupervised. (2) We
exploit the inherent hierarchical structure of proposals as an effective
regularizer for the approach to object discovery of Vo et al., boosting its
performance to significantly improve over the state of the art on several
standard benchmarks. (3) We adopt a two-stage strategy to select promising
proposals using small random sets of images before using the whole image
collection to discover the objects it depicts, allowing us to tackle, for the
first time (to the best of our knowledge), the discovery of multiple objects in
each one of the pictures making up datasets with up to 20,000 images, an over
five-fold increase compared to existing methods, and a first step toward true
large-scale unsupervised image interpretation.
Related papers
- A Simple Baseline for Multi-Camera 3D Object Detection [94.63944826540491]
3D object detection with surrounding cameras has been a promising direction for autonomous driving.
We present SimMOD, a Simple baseline for Multi-camera Object Detection.
We conduct extensive experiments on the 3D object detection benchmark of nuScenes to demonstrate the effectiveness of SimMOD.
arXiv Detail & Related papers (2022-08-22T03:38:01Z) - Bridging the Gap between Object and Image-level Representations for
Open-Vocabulary Detection [54.96069171726668]
Two popular forms of weak-supervision used in open-vocabulary detection (OVD) include pretrained CLIP model and image-level supervision.
We propose to address this problem by performing object-centric alignment of the language embeddings from the CLIP model.
We establish a bridge between the above two object-alignment strategies via a novel weight transfer function.
arXiv Detail & Related papers (2022-07-07T17:59:56Z) - Facing the Void: Overcoming Missing Data in Multi-View Imagery [0.783788180051711]
We propose a novel technique for multi-view image classification robust to this problem.
The proposed method, based on state-of-the-art deep learning-based approaches and metric learning, can be easily adapted and exploited in other applications and domains.
Results show that the proposed algorithm provides improvements in multi-view image classification accuracy when compared to state-of-the-art methods.
arXiv Detail & Related papers (2022-05-21T13:21:27Z) - Large-Scale Unsupervised Object Discovery [80.60458324771571]
unsupervised object discovery (UOD) do not scale up to large datasets without approximations which compromise their performance.
We propose a novel formulation of UOD as a ranking problem, amenable to the arsenal of distributed methods available for eigenvalue problems and link analysis.
arXiv Detail & Related papers (2021-06-12T00:29:49Z) - Ensembling object detectors for image and video data analysis [98.26061123111647]
We propose a method for ensembling the outputs of multiple object detectors for improving detection performance and precision of bounding boxes on image data.
We extend it to video data by proposing a two-stage tracking-based scheme for detection refinement.
arXiv Detail & Related papers (2021-02-09T12:38:16Z) - Addressing Visual Search in Open and Closed Set Settings [8.928169373673777]
We present a method for predicting pixel-level objectness from a low resolution gist image.
We then use to select regions for performing object detection locally at high resolution.
Second, we propose a novel strategy for open-set visual search that seeks to find all instances of a target class which may be previously unseen.
arXiv Detail & Related papers (2020-12-11T17:21:28Z) - Tasks Integrated Networks: Joint Detection and Retrieval for Image
Search [99.49021025124405]
In many real-world searching scenarios (e.g., video surveillance), the objects are seldom accurately detected or annotated.
We first introduce an end-to-end Integrated Net (I-Net), which has three merits.
We further propose an improved I-Net, called DC-I-Net, which makes two new contributions.
arXiv Detail & Related papers (2020-09-03T03:57:50Z) - Multiple instance learning on deep features for weakly supervised object
detection with extreme domain shifts [1.9336815376402716]
Weakly supervised object detection (WSOD) using only image-level annotations has attracted a growing attention over the past few years.
We show that a simple multiple instance approach applied on pre-trained deep features yields excellent performances on non-photographic datasets.
arXiv Detail & Related papers (2020-08-03T20:36:01Z) - Localizing Grouped Instances for Efficient Detection in Low-Resource
Scenarios [27.920304852537534]
We propose a novel flexible detection scheme that efficiently adapts to variable object sizes and densities.
We rely on a sequence of detection stages, each of which has the ability to predict groups of objects as well as individuals.
We report experimental results on two aerial image datasets, and show that the proposed method is as accurate yet computationally more efficient than standard single-shot detectors.
arXiv Detail & Related papers (2020-04-27T07:56:53Z) - Object-Centric Image Generation from Layouts [93.10217725729468]
We develop a layout-to-image-generation method to generate complex scenes with multiple objects.
Our method learns representations of the spatial relationships between objects in the scene, which lead to our model's improved layout-fidelity.
We introduce SceneFID, an object-centric adaptation of the popular Fr'echet Inception Distance metric, that is better suited for multi-object images.
arXiv Detail & Related papers (2020-03-16T21:40:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.