Intrinsic Relationship Reasoning for Small Object Detection
- URL: http://arxiv.org/abs/2009.00833v1
- Date: Wed, 2 Sep 2020 06:03:05 GMT
- Title: Intrinsic Relationship Reasoning for Small Object Detection
- Authors: Kui Fu, Jia Li, Lin Ma, Kai Mu, Yonghong Tian
- Abstract summary: Small objects in images and videos are usually not independent individuals. Instead, they more or less present some semantic and spatial layout relationships with each other.
We propose a novel context reasoning approach for small object detection which models and infers the intrinsic semantic and spatial layout relationships between objects.
- Score: 44.68289739449486
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The small objects in images and videos are usually not independent
individuals. Instead, they more or less present some semantic and spatial
layout relationships with each other. Modeling and inferring such intrinsic
relationships can thereby be beneficial for small object detection. In this
paper, we propose a novel context reasoning approach for small object detection
which models and infers the intrinsic semantic and spatial layout relationships
between objects. Specifically, we first construct a semantic module to model
the sparse semantic relationships based on the initial regional features, and a
spatial layout module to model the sparse spatial layout relationships based on
their position and shape information, respectively. Both of them are then fed
into a context reasoning module for integrating the contextual information with
respect to the objects and their relationships, which is further fused with the
original regional visual features for classification and regression.
Experimental results reveal that the proposed approach can effectively boost
the small object detection performance.
Related papers
- Scene-Graph ViT: End-to-End Open-Vocabulary Visual Relationship Detection [14.22646492640906]
We propose a simple and highly efficient decoder-free architecture for open-vocabulary visual relationship detection.
Our model consists of a Transformer-based image encoder that represents objects as tokens and models their relationships implicitly.
Our approach achieves state-of-the-art relationship detection performance on Visual Genome and on the large-vocabulary GQA benchmark at real-time inference speeds.
arXiv Detail & Related papers (2024-03-21T10:15:57Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - Semantic-guided modeling of spatial relation and object co-occurrence for indoor scene recognition [5.083140094792973]
SpaCoNet simultaneously models Spatial relation and Co-occurrence of objects guided by semantic segmentation.
Experimental results on three widely used scene datasets demonstrate the effectiveness and generality of the proposed method.
arXiv Detail & Related papers (2023-05-22T03:04:22Z) - Benchmarking Spatial Relationships in Text-to-Image Generation [102.62422723894232]
We investigate the ability of text-to-image models to generate correct spatial relationships among objects.
We present VISOR, an evaluation metric that captures how accurately the spatial relationship described in text is generated in the image.
Our experiments reveal a surprising finding that, although state-of-the-art T2I models exhibit high image quality, they are severely limited in their ability to generate multiple objects or the specified spatial relations between them.
arXiv Detail & Related papers (2022-12-20T06:03:51Z) - Spatial Reasoning for Few-Shot Object Detection [21.3564383157159]
We propose a spatial reasoning framework that detects novel objects with only a few training examples in a context.
We employ a graph convolutional network as the RoIs and their relatedness are defined as nodes and edges, respectively.
We demonstrate that the proposed method significantly outperforms the state-of-the-art methods and verify its efficacy through extensive ablation studies.
arXiv Detail & Related papers (2022-11-02T12:38:08Z) - Discovering Objects that Can Move [55.743225595012966]
We study the problem of object discovery -- separating objects from the background without manual labels.
Existing approaches utilize appearance cues, such as color, texture, and location, to group pixels into object-like regions.
We choose to focus on dynamic objects -- entities that can move independently in the world.
arXiv Detail & Related papers (2022-03-18T21:13:56Z) - DisARM: Displacement Aware Relation Module for 3D Detection [38.4380420322491]
Displacement Aware Relation Module (DisARM) is a novel neural network module for enhancing the performance of 3D object detection in point cloud scenes.
To find the anchors, we first perform a preliminary relation anchor module with an objectness-aware sampling approach.
This lightweight relation module leads to significantly higher accuracy of object instance detection when being plugged into the state-of-the-art detectors.
arXiv Detail & Related papers (2022-03-02T14:49:55Z) - Synthesizing the Unseen for Zero-shot Object Detection [72.38031440014463]
We propose to synthesize visual features for unseen classes, so that the model learns both seen and unseen objects in the visual domain.
We use a novel generative model that uses class-semantics to not only generate the features but also to discriminatively separate them.
arXiv Detail & Related papers (2020-10-19T12:36:11Z) - Attention-based Joint Detection of Object and Semantic Part [4.389917490809522]
Our model is created on top of two Faster-RCNN models that share their features to get enhanced representations of both.
Experiments on the PASCAL-Part 2010 dataset show that joint detection can simultaneously improve both object detection and part detection.
arXiv Detail & Related papers (2020-07-05T18:54:10Z) - Spatial Priming for Detecting Human-Object Interactions [89.22921959224396]
We present a method for exploiting spatial layout information for detecting human-object interactions (HOIs) in images.
The proposed method consists of a layout module which primes a visual module to predict the type of interaction between a human and an object.
The proposed model reaches an mAP of 24.79% for HICO-Det dataset which is about 2.8% absolute points higher than the current state-of-the-art.
arXiv Detail & Related papers (2020-04-09T23:20:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.