Attention-based Joint Detection of Object and Semantic Part
- URL: http://arxiv.org/abs/2007.02419v1
- Date: Sun, 5 Jul 2020 18:54:10 GMT
- Title: Attention-based Joint Detection of Object and Semantic Part
- Authors: Keval Morabia, Jatin Arora, Tara Vijaykumar
- Abstract summary: Our model is created on top of two Faster-RCNN models that share their features to get enhanced representations of both.
Experiments on the PASCAL-Part 2010 dataset show that joint detection can simultaneously improve both object detection and part detection.
- Score: 4.389917490809522
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we address the problem of joint detection of objects like dog
and its semantic parts like face, leg, etc. Our model is created on top of two
Faster-RCNN models that share their features to perform a novel Attention-based
feature fusion of related Object and Part features to get enhanced
representations of both. These representations are used for final
classification and bounding box regression separately for both models. Our
experiments on the PASCAL-Part 2010 dataset show that joint detection can
simultaneously improve both object detection and part detection in terms of
mean Average Precision (mAP) at IoU=0.5.
Related papers
- Towards Consistent Object Detection via LiDAR-Camera Synergy [17.665362927472973]
There is no existing model capable of detecting an object's position in both point clouds and images.
This paper introduces an end-to-end Consistency Object Detection (COD) algorithm framework.
To assess the accuracy of the object correlation between point clouds and images, this paper proposes a new evaluation metric, Consistency Precision.
arXiv Detail & Related papers (2024-05-02T13:04:26Z) - Scene-Graph ViT: End-to-End Open-Vocabulary Visual Relationship Detection [14.22646492640906]
We propose a simple and highly efficient decoder-free architecture for open-vocabulary visual relationship detection.
Our model consists of a Transformer-based image encoder that represents objects as tokens and models their relationships implicitly.
Our approach achieves state-of-the-art relationship detection performance on Visual Genome and on the large-vocabulary GQA benchmark at real-time inference speeds.
arXiv Detail & Related papers (2024-03-21T10:15:57Z) - Object-Centric Multiple Object Tracking [124.30650395969126]
This paper proposes a video object-centric model for multiple-object tracking pipelines.
It consists of an index-merge module that adapts the object-centric slots into detection outputs and an object memory module.
Benefited from object-centric learning, we only require sparse detection labels for object localization and feature binding.
arXiv Detail & Related papers (2023-09-01T03:34:12Z) - Unified Visual Relationship Detection with Vision and Language Models [89.77838890788638]
This work focuses on training a single visual relationship detector predicting over the union of label spaces from multiple datasets.
We propose UniVRD, a novel bottom-up method for Unified Visual Relationship Detection by leveraging vision and language models.
Empirical results on both human-object interaction detection and scene-graph generation demonstrate the competitive performance of our model.
arXiv Detail & Related papers (2023-03-16T00:06:28Z) - A Tri-Layer Plugin to Improve Occluded Detection [100.99802831241583]
We propose a simple '' module for the detection head of two-stage object detectors to improve the recall of partially occluded objects.
The module predicts a tri-layer of segmentation masks for the target object, the occluder and the occludee, and by doing so is able to better predict the mask of the target object.
We also establish a COCO evaluation dataset to measure the recall performance of partially occluded and separated objects.
arXiv Detail & Related papers (2022-10-18T17:59:51Z) - Improving Object Detection and Attribute Recognition by Feature
Entanglement Reduction [26.20319853343761]
We show that object detection should be attribute-independent and attributes be largely object-independent.
We disentangle them by the use of a two-stream model where the category and attribute features are computed independently but the classification heads share Regions of Interest (RoIs)
Compared with a traditional single-stream model, our model shows significant improvements over VG-20, a subset of Visual Genome, on both supervised and attribute transfer tasks.
arXiv Detail & Related papers (2021-08-25T22:27:06Z) - Visual Composite Set Detection Using Part-and-Sum Transformers [74.26037922682355]
We present a new approach, denoted Part-and-Sum detection Transformer (PST), to perform end-to-end composite set detection.
PST achieves state-of-the-art results among single-stage models, while nearly matching the results of custom-designed two-stage models.
arXiv Detail & Related papers (2021-05-05T16:31:32Z) - Uncertainty-aware Joint Salient Object and Camouflaged Object Detection [43.01556978979627]
We propose a paradigm of leveraging the contradictory information to enhance the detection ability of both salient object detection and camouflaged object detection.
We introduce a similarity measure module to explicitly model the contradicting attributes of these two tasks.
Considering the uncertainty of labeling in both tasks' datasets, we propose an adversarial learning network to achieve both higher order similarity measure and network confidence estimation.
arXiv Detail & Related papers (2021-04-06T16:05:10Z) - Pose-based Modular Network for Human-Object Interaction Detection [5.6397911482914385]
We contribute a Pose-based Modular Network (PMN) which explores the absolute pose features and relative spatial pose features to improve HOI detection.
To evaluate our proposed method, we combine the module with the state-of-the-art model named VS-GATs and obtain significant improvement on two public benchmarks.
arXiv Detail & Related papers (2020-08-05T10:56:09Z) - A Graph-based Interactive Reasoning for Human-Object Interaction
Detection [71.50535113279551]
We present a novel graph-based interactive reasoning model called Interactive Graph (abbr. in-Graph) to infer HOIs.
We construct a new framework to assemble in-Graph models for detecting HOIs, namely in-GraphNet.
Our framework is end-to-end trainable and free from costly annotations like human pose.
arXiv Detail & Related papers (2020-07-14T09:29:03Z) - Condensing Two-stage Detection with Automatic Object Key Part Discovery [87.1034745775229]
Two-stage object detectors generally require excessively large models for their detection heads to achieve high accuracy.
We propose that the model parameters of two-stage detection heads can be condensed and reduced by concentrating on object key parts.
Our proposed technique consistently maintains original performance while waiving around 50% of the model parameters of common two-stage detection heads.
arXiv Detail & Related papers (2020-06-10T01:20:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.