Learning to recognize occluded and small objects with partial inputs
- URL: http://arxiv.org/abs/2310.18517v1
- Date: Fri, 27 Oct 2023 22:29:27 GMT
- Title: Learning to recognize occluded and small objects with partial inputs
- Authors: Hasib Zunair and A. Ben Hamza
- Abstract summary: Masked Supervised Learning is a single-stage, model-agnostic learning paradigm for multi-label image recognition.
We show that MSL is robust to random masking and demonstrate its effectiveness in recognizing non-masked objects.
- Score: 8.460351690226817
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recognizing multiple objects in an image is challenging due to occlusions,
and becomes even more so when the objects are small. While promising, existing
multi-label image recognition models do not explicitly learn context-based
representations, and hence struggle to correctly recognize small and occluded
objects. Intuitively, recognizing occluded objects requires knowledge of
partial input, and hence context. Motivated by this intuition, we propose
Masked Supervised Learning (MSL), a single-stage, model-agnostic learning
paradigm for multi-label image recognition. The key idea is to learn
context-based representations using a masked branch and to model label
co-occurrence using label consistency. Experimental results demonstrate the
simplicity, applicability and more importantly the competitive performance of
MSL against previous state-of-the-art methods on standard multi-label image
recognition benchmarks. In addition, we show that MSL is robust to random
masking and demonstrate its effectiveness in recognizing non-masked objects.
Code and pretrained models are available on GitHub.
Related papers
- A Generative Approach for Wikipedia-Scale Visual Entity Recognition [56.55633052479446]
We address the task of mapping a given query image to one of the 6 million existing entities in Wikipedia.
We introduce a novel Generative Entity Recognition framework, which learns to auto-regressively decode a semantic and discriminative code'' identifying the target entity.
arXiv Detail & Related papers (2024-03-04T13:47:30Z) - Self-Supervised Learning for Visual Relationship Detection through
Masked Bounding Box Reconstruction [6.798515070856465]
We present a novel self-supervised approach for representation learning, particularly for the task of Visual Relationship Detection (VRD)
Motivated by the effectiveness of Masked Image Modeling (MIM), we propose Masked Bounding Box Reconstruction (MBBR)
arXiv Detail & Related papers (2023-11-08T16:59:26Z) - CtxMIM: Context-Enhanced Masked Image Modeling for Remote Sensing Image Understanding [38.53988682814626]
We propose a context-enhanced masked image modeling method (CtxMIM) for remote sensing image understanding.
CtxMIM formulates original image patches as a reconstructive template and employs a Siamese framework to operate on two sets of image patches.
With the simple and elegant design, CtxMIM encourages the pre-training model to learn object-level or pixel-level features on a large-scale dataset.
arXiv Detail & Related papers (2023-09-28T18:04:43Z) - Multi-Label Self-Supervised Learning with Scene Images [21.549234013998255]
This paper shows that quality image representations can be learned by treating scene/multi-label image SSL simply as a multi-label classification problem.
The proposed method is named Multi-Label Self-supervised learning (MLS)
arXiv Detail & Related papers (2023-08-07T04:04:22Z) - Improving Human-Object Interaction Detection via Virtual Image Learning [68.56682347374422]
Human-Object Interaction (HOI) detection aims to understand the interactions between humans and objects.
In this paper, we propose to alleviate the impact of such an unbalanced distribution via Virtual Image Leaning (VIL)
A novel label-to-image approach, Multiple Steps Image Creation (MUSIC), is proposed to create a high-quality dataset that has a consistent distribution with real images.
arXiv Detail & Related papers (2023-08-04T10:28:48Z) - Understanding Self-Supervised Pretraining with Part-Aware Representation
Learning [88.45460880824376]
We study the capability that self-supervised representation pretraining methods learn part-aware representations.
Results show that the fully-supervised model outperforms self-supervised models for object-level recognition.
arXiv Detail & Related papers (2023-01-27T18:58:42Z) - Learning Hierarchical Image Segmentation For Recognition and By Recognition [39.712584686731574]
We propose to integrate a hierarchical segmenter into the recognition process, train and adapt the entire model solely on image-level recognition objectives.
We learn hierarchical segmentation for free alongside recognition, automatically uncovering part-to-whole relationships that not only underpin but also enhance recognition.
Notably, our model (trained on unlabeled 1M ImageNet images) outperforms SAM (trained on 11M images masks) by absolute 8% in mIoU on PartImageNet object segmentation.
arXiv Detail & Related papers (2022-10-01T16:31:44Z) - Multi-Modal Few-Shot Object Detection with Meta-Learning-Based
Cross-Modal Prompting [77.69172089359606]
We study multi-modal few-shot object detection (FSOD) in this paper, using both few-shot visual examples and class semantic information for detection.
Our approach is motivated by the high-level conceptual similarity of (metric-based) meta-learning and prompt-based learning.
We comprehensively evaluate the proposed multi-modal FSOD models on multiple few-shot object detection benchmarks, achieving promising results.
arXiv Detail & Related papers (2022-04-16T16:45:06Z) - Simpler Does It: Generating Semantic Labels with Objectness Guidance [32.81128493853064]
We present a novel framework that generates pseudo-labels for training images, which are then used to train a segmentation model.
To generate pseudo-labels, we combine information from: (i) a class agnostic objectness network that learns to recognize object-like regions, and (ii) either image-level or bounding box annotations.
We show the efficacy of our approach by demonstrating how the objectness network can naturally be leveraged to generate object-like regions for unseen categories.
arXiv Detail & Related papers (2021-10-20T01:52:05Z) - Weakly-Supervised Saliency Detection via Salient Object Subitizing [57.17613373230722]
We introduce saliency subitizing as the weak supervision since it is class-agnostic.
This allows the supervision to be aligned with the property of saliency detection.
We conduct extensive experiments on five benchmark datasets.
arXiv Detail & Related papers (2021-01-04T12:51:45Z) - Exploit Clues from Views: Self-Supervised and Regularized Learning for
Multiview Object Recognition [66.87417785210772]
This work investigates the problem of multiview self-supervised learning (MV-SSL)
A novel surrogate task for self-supervised learning is proposed by pursuing "object invariant" representation.
Experiments shows that the recognition and retrieval results using view invariant prototype embedding (VISPE) outperform other self-supervised learning methods.
arXiv Detail & Related papers (2020-03-28T07:06:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.