Related papers: Compositional Convolutional Neural Networks: A Robust and Interpretable Model for Object Recognition under Occlusion

Compositional Convolutional Neural Networks: A Robust and Interpretable Model for Object Recognition under Occlusion

URL: http://arxiv.org/abs/2006.15538v1
Date: Sun, 28 Jun 2020 08:18:19 GMT
Title: Compositional Convolutional Neural Networks: A Robust and Interpretable Model for Object Recognition under Occlusion
Authors: Adam Kortylewski and Qing Liu and Angtian Wang and Yihong Sun and Alan Yuille
Abstract summary: We show that black-box deep convolutional neural networks (DCNNs) have only limited robustness to partial occlusion. We overcome these limitations by unifying DCNNs with part-based models into Compositional Convolutional Neural Networks (CompositionalNets) Our experiments show that CompositionalNets improve by a large margin over their non-compositional counterparts at classifying and detecting partially occluded objects.
Score: 21.737411464598797
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Computer vision systems in real-world applications need to be robust to partial occlusion while also being explainable. In this work, we show that black-box deep convolutional neural networks (DCNNs) have only limited robustness to partial occlusion. We overcome these limitations by unifying DCNNs with part-based models into Compositional Convolutional Neural Networks (CompositionalNets) - an interpretable deep architecture with innate robustness to partial occlusion. Specifically, we propose to replace the fully connected classification head of DCNNs with a differentiable compositional model that can be trained end-to-end. The structure of the compositional model enables CompositionalNets to decompose images into objects and context, as well as to further decompose object representations in terms of individual parts and the objects' pose. The generative nature of our compositional model enables it to localize occluders and to recognize objects based on their non-occluded parts. We conduct extensive experiments in terms of image classification and object detection on images of artificially occluded objects from the PASCAL3D+ and ImageNet dataset, and real images of partially occluded vehicles from the MS-COCO dataset. Our experiments show that CompositionalNets made from several popular DCNN backbones (VGG-16, ResNet50, ResNext) improve by a large margin over their non-compositional counterparts at classifying and detecting partially occluded objects. Furthermore, they can localize occluders accurately despite being trained with class-level supervision only. Finally, we demonstrate that CompositionalNets provide human interpretable predictions as their individual components can be understood as detecting parts and estimating an objects' viewpoint.

Related papers

Online 3D Scene Reconstruction Using Neural Object Priors [83.14204014687938]
This paper addresses the problem of reconstructing a scene online at the level of objects given an RGB-D video sequence. We propose a feature grid mechanism to continuously update object-centric neural implicit representations as new object parts are revealed. Our approach outperforms state-of-the-art neural implicit models for this task in terms of reconstruction accuracy and completeness.
arXiv Detail & Related papers (2025-03-24T17:09:36Z)
Are Deep Learning Models Robust to Partial Object Occlusion in Visual Recognition Tasks? [4.9260675787714]
Image classification models, including convolutional neural networks (CNNs), perform well on a variety of classification tasks but struggle under partial occlusion. We contribute the Image Recognition Under Occlusion (IRUO) dataset, based on the recently developed Occluded Video Instance (IRUO) dataset (arXiv:2102.01558) We find that modern CNN-based models show improved recognition accuracy on occluded images compared to earlier CNN-based models, and ViT-based models are more accurate than CNN-based models on occluded images.
arXiv Detail & Related papers (2024-09-16T23:21:22Z)
Unsupervised Part Discovery from Contrastive Reconstruction [90.88501867321573]
The goal of self-supervised visual representation learning is to learn strong, transferable image representations. We propose an unsupervised approach to object part discovery and segmentation. Our method yields semantic parts consistent across fine-grained but visually distinct categories.
arXiv Detail & Related papers (2021-11-11T17:59:42Z)
Unsupervised Image Decomposition with Phase-Correlation Networks [28.502280038100167]
Phase-Correlation Decomposition Network (PCDNet) is a novel model that decomposes a scene into its object components. In our experiments, we show how PCDNet outperforms state-of-the-art methods for unsupervised object discovery and segmentation.
arXiv Detail & Related papers (2021-10-07T13:57:33Z)
Compositional Sketch Search [91.84489055347585]
We present an algorithm for searching image collections using free-hand sketches. We exploit drawings as a concise and intuitive representation for specifying entire scene compositions.
arXiv Detail & Related papers (2021-06-15T09:38:09Z)
Neural Parts: Learning Expressive 3D Shape Abstractions with Invertible Neural Networks [118.20778308823779]
We present a novel 3D primitive representation that defines primitives using an Invertible Neural Network (INN) Our model learns to parse 3D objects into semantically consistent part arrangements without any part-level supervision.
arXiv Detail & Related papers (2021-03-18T17:59:31Z)
Robust Instance Segmentation through Reasoning about Multi-Object Occlusion [9.536947328412198]
We propose a deep network for multi-object instance segmentation that is robust to occlusion. Our work builds on Compositional Networks, which learn a generative model of neural feature activations to locate occluders. In particular, we obtain feed-forward predictions of the object classes and their instance and occluder segmentations.
arXiv Detail & Related papers (2020-12-03T17:41:55Z)
Understanding the Role of Individual Units in a Deep Neural Network [85.23117441162772]
We present an analytic framework to systematically identify hidden units within image classification and image generation networks. First, we analyze a convolutional neural network (CNN) trained on scene classification and discover units that match a diverse set of object concepts. Second, we use a similar analytic method to analyze a generative adversarial network (GAN) model trained to generate scenes.
arXiv Detail & Related papers (2020-09-10T17:59:10Z)
Category Level Object Pose Estimation via Neural Analysis-by-Synthesis [64.14028598360741]
In this paper we combine a gradient-based fitting procedure with a parametric neural image synthesis module. The image synthesis network is designed to efficiently span the pose configuration space. We experimentally show that the method can recover orientation of objects with high accuracy from 2D images alone.
arXiv Detail & Related papers (2020-08-18T20:30:47Z)
Robust Object Detection under Occlusion with Context-Aware CompositionalNets [21.303976151518125]
Compositional convolutional neural networks (CompositionalNets) have been shown to be robust at classifying occluded objects. We propose to overcome two limitations of CompositionalNets which will enable them to detect partially occluded objects.
arXiv Detail & Related papers (2020-05-24T02:57:34Z)
Learning Unsupervised Hierarchical Part Decomposition of 3D Objects from a Single RGB Image [102.44347847154867]
We propose a novel formulation that allows to jointly recover the geometry of a 3D object as a set of primitives. Our model recovers the higher level structural decomposition of various objects in the form of a binary tree of primitives. Our experiments on the ShapeNet and D-FAUST datasets demonstrate that considering the organization of parts indeed facilitates reasoning about 3D geometry.
arXiv Detail & Related papers (2020-04-02T17:58:05Z)
Compositional Convolutional Neural Networks: A Deep Architecture with Innate Robustness to Partial Occlusion [18.276428975330813]
Recent findings show that deep convolutional neural networks (DCNNs) do not generalize well under partial occlusion. Inspired by the success of compositional models at classifying partially occluded objects, we propose to integrate compositional models and DCNNs into a unified deep model. We conduct classification experiments on artificially occluded images as well as real images of partially occluded objects from the MS-COCO dataset. Our proposed model outperforms standard DCNNs by a large margin at classifying partially occluded objects, even when it has not been exposed to occluded objects during training.
arXiv Detail & Related papers (2020-03-10T01:45:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.