Related papers: Intrinsic Relationship Reasoning for Small Object Detection

Intrinsic Relationship Reasoning for Small Object Detection

URL: http://arxiv.org/abs/2009.00833v1
Date: Wed, 2 Sep 2020 06:03:05 GMT
Title: Intrinsic Relationship Reasoning for Small Object Detection
Authors: Kui Fu, Jia Li, Lin Ma, Kai Mu, Yonghong Tian
Abstract summary: Small objects in images and videos are usually not independent individuals. Instead, they more or less present some semantic and spatial layout relationships with each other. We propose a novel context reasoning approach for small object detection which models and infers the intrinsic semantic and spatial layout relationships between objects.
Score: 44.68289739449486
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The small objects in images and videos are usually not independent individuals. Instead, they more or less present some semantic and spatial layout relationships with each other. Modeling and inferring such intrinsic relationships can thereby be beneficial for small object detection. In this paper, we propose a novel context reasoning approach for small object detection which models and infers the intrinsic semantic and spatial layout relationships between objects. Specifically, we first construct a semantic module to model the sparse semantic relationships based on the initial regional features, and a spatial layout module to model the sparse spatial layout relationships based on their position and shape information, respectively. Both of them are then fed into a context reasoning module for integrating the contextual information with respect to the objects and their relationships, which is further fused with the original regional visual features for classification and regression. Experimental results reveal that the proposed approach can effectively boost the small object detection performance.

Related papers

ResVG: Enhancing Relation and Semantic Understanding in Multiple Instances for Visual Grounding [42.10086029931937]
Visual grounding aims to localize the object referred to in an image based on a natural language query. Existing methods demonstrate a significant performance drop when there are multiple distractions in an image. We propose a novel approach, the Relation and Semantic-sensitive Visual Grounding (ResVG) model, to address this issue.
arXiv Detail & Related papers (2024-08-29T07:32:01Z)
A Plug-and-Play Method for Rare Human-Object Interactions Detection by Bridging Domain Gap [50.079224604394]
We present a novel model-agnostic framework called textbfContext-textbfEnhanced textbfFeature textbfAment (CEFA) CEFA consists of a feature alignment module and a context enhancement module. Our method can serve as a plug-and-play module to improve the detection performance of HOI models on rare categories.
arXiv Detail & Related papers (2024-07-31T08:42:48Z)
Scene-Graph ViT: End-to-End Open-Vocabulary Visual Relationship Detection [14.22646492640906]
We propose a simple and highly efficient decoder-free architecture for open-vocabulary visual relationship detection. Our model consists of a Transformer-based image encoder that represents objects as tokens and models their relationships implicitly. Our approach achieves state-of-the-art relationship detection performance on Visual Genome and on the large-vocabulary GQA benchmark at real-time inference speeds.
arXiv Detail & Related papers (2024-03-21T10:15:57Z)
Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner. We design a semantic-guided self-supervised learning model to extract high-level semantic features from images. We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z)
Semantic-guided modeling of spatial relation and object co-occurrence for indoor scene recognition [5.083140094792973]
SpaCoNet simultaneously models Spatial relation and Co-occurrence of objects guided by semantic segmentation. Experimental results on three widely used scene datasets demonstrate the effectiveness and generality of the proposed method.
arXiv Detail & Related papers (2023-05-22T03:04:22Z)
Benchmarking Spatial Relationships in Text-to-Image Generation [102.62422723894232]
We investigate the ability of text-to-image models to generate correct spatial relationships among objects. We present VISOR, an evaluation metric that captures how accurately the spatial relationship described in text is generated in the image. Our experiments reveal a surprising finding that, although state-of-the-art T2I models exhibit high image quality, they are severely limited in their ability to generate multiple objects or the specified spatial relations between them.
arXiv Detail & Related papers (2022-12-20T06:03:51Z)
Spatial Reasoning for Few-Shot Object Detection [21.3564383157159]
We propose a spatial reasoning framework that detects novel objects with only a few training examples in a context. We employ a graph convolutional network as the RoIs and their relatedness are defined as nodes and edges, respectively. We demonstrate that the proposed method significantly outperforms the state-of-the-art methods and verify its efficacy through extensive ablation studies.
arXiv Detail & Related papers (2022-11-02T12:38:08Z)
Discovering Objects that Can Move [55.743225595012966]
We study the problem of object discovery -- separating objects from the background without manual labels. Existing approaches utilize appearance cues, such as color, texture, and location, to group pixels into object-like regions. We choose to focus on dynamic objects -- entities that can move independently in the world.
arXiv Detail & Related papers (2022-03-18T21:13:56Z)
Synthesizing the Unseen for Zero-shot Object Detection [72.38031440014463]
We propose to synthesize visual features for unseen classes, so that the model learns both seen and unseen objects in the visual domain. We use a novel generative model that uses class-semantics to not only generate the features but also to discriminatively separate them.
arXiv Detail & Related papers (2020-10-19T12:36:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.