Detective: An Attentive Recurrent Model for Sparse Object Detection
- URL: http://arxiv.org/abs/2004.12197v1
- Date: Sat, 25 Apr 2020 17:41:52 GMT
- Title: Detective: An Attentive Recurrent Model for Sparse Object Detection
- Authors: Amine Kechaou, Manuel Martinez, Monica Haurilet and Rainer
Stiefelhagen
- Abstract summary: Detective is an attentive object detector that identifies objects in images in a sequential manner.
Detective is a sparse object detector that generates a single bounding box per object instance.
We propose a training mechanism based on the Hungarian algorithm and a loss that balances the localization and classification tasks.
- Score: 25.5804429439316
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we present Detective - an attentive object detector that
identifies objects in images in a sequential manner. Our network is based on an
encoder-decoder architecture, where the encoder is a convolutional neural
network, and the decoder is a convolutional recurrent neural network coupled
with an attention mechanism. At each iteration, our decoder focuses on the
relevant parts of the image using an attention mechanism, and then estimates
the object's class and the bounding box coordinates. Current object detection
models generate dense predictions and rely on post-processing to remove
duplicate predictions. Detective is a sparse object detector that generates a
single bounding box per object instance. However, training a sparse object
detector is challenging, as it requires the model to reason at the instance
level and not just at the class and spatial levels. We propose a training
mechanism based on the Hungarian algorithm and a loss that balances the
localization and classification tasks. This allows Detective to achieve
promising results on the PASCAL VOC object detection dataset. Our experiments
demonstrate that sparse object detection is possible and has a great potential
for future developments in applications where the order of the objects to be
predicted is of interest.
Related papers
- Object Detection in Aerial Images with Uncertainty-Aware Graph Network [61.02591506040606]
We propose a novel uncertainty-aware object detection framework with a structured-graph, where nodes and edges are denoted by objects.
We refer to our model as Uncertainty-Aware Graph network for object DETection (UAGDet)
arXiv Detail & Related papers (2022-08-23T07:29:03Z) - Instance-Aware Observer Network for Out-of-Distribution Object
Segmentation [94.73449180972239]
We extend the approach of ObsNet by harnessing an instance-wise mask prediction.
We show that our proposed method accurately disentangles in-distribution objects from Out-Of-Distribution objects on three datasets.
arXiv Detail & Related papers (2022-07-18T17:38:40Z) - Multi-Grid Redundant Bounding Box Annotation for Accurate Object
Detection [0.0]
YOLOv3 is a state-of-the-art one-shot detector that takes in an input image and divides it into an equal-sized grid matrix.
This paper presents a new mathematical approach that assigns multiple grids per object for accurately tight-fit bounding box prediction.
arXiv Detail & Related papers (2022-01-05T23:01:55Z) - Robust Region Feature Synthesizer for Zero-Shot Object Detection [87.79902339984142]
We build a novel zero-shot object detection framework that contains an Intra-class Semantic Diverging component and an Inter-class Structure Preserving component.
It is the first study to carry out zero-shot object detection in remote sensing imagery.
arXiv Detail & Related papers (2022-01-01T03:09:15Z) - Pix2seq: A Language Modeling Framework for Object Detection [12.788663431798588]
Pix2Seq is a simple and generic framework for object detection.
We train a neural net to perceive the image and generate the desired sequence.
Our approach is based mainly on the intuition that if a neural net knows about where and what the objects are, we just need to teach it how to read them out.
arXiv Detail & Related papers (2021-09-22T17:26:36Z) - AdaCon: Adaptive Context-Aware Object Detection for Resource-Constrained
Embedded Devices [2.5345835184316536]
Convolutional Neural Networks achieve state-of-the-art accuracy in object detection tasks.
They have large computational and energy requirements that challenge their deployment on resource-constrained edge devices.
In this paper, we leverage the prior knowledge about the probabilities that different object categories can occur jointly to increase the efficiency of object detection models.
Our experiments using COCO dataset show that our adaptive object detection model achieves up to 45% reduction in the energy consumption, and up to 27% reduction in the latency, with a small loss in the average precision (AP) of object detection.
arXiv Detail & Related papers (2021-08-16T01:21:55Z) - Instance Localization for Self-supervised Detection Pretraining [68.24102560821623]
We propose a new self-supervised pretext task, called instance localization.
We show that integration of bounding boxes into pretraining promotes better task alignment and architecture alignment for transfer learning.
Experimental results demonstrate that our approach yields state-of-the-art transfer learning results for object detection.
arXiv Detail & Related papers (2021-02-16T17:58:57Z) - Class-agnostic Object Detection [16.97782147401037]
We propose class-agnostic object detection as a new problem that focuses on detecting objects irrespective of their object-classes.
Specifically, the goal is to predict bounding boxes for all objects in an image but not their object-classes.
We propose training and evaluation protocols for benchmarking class-agnostic detectors to advance future research in this domain.
arXiv Detail & Related papers (2020-11-28T19:22:38Z) - Slender Object Detection: Diagnoses and Improvements [74.40792217534]
In this paper, we are concerned with the detection of a particular type of objects with extreme aspect ratios, namely textbfslender objects.
For a classical object detection method, a drastic drop of $18.9%$ mAP on COCO is observed, if solely evaluated on slender objects.
arXiv Detail & Related papers (2020-11-17T09:39:42Z) - Synthesizing the Unseen for Zero-shot Object Detection [72.38031440014463]
We propose to synthesize visual features for unseen classes, so that the model learns both seen and unseen objects in the visual domain.
We use a novel generative model that uses class-semantics to not only generate the features but also to discriminatively separate them.
arXiv Detail & Related papers (2020-10-19T12:36:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.