Unsupervised Image Decomposition with Phase-Correlation Networks
- URL: http://arxiv.org/abs/2110.03473v2
- Date: Fri, 8 Oct 2021 08:33:30 GMT
- Title: Unsupervised Image Decomposition with Phase-Correlation Networks
- Authors: Angel Villar-Corrales and Sven Behnke
- Abstract summary: Phase-Correlation Decomposition Network (PCDNet) is a novel model that decomposes a scene into its object components.
In our experiments, we show how PCDNet outperforms state-of-the-art methods for unsupervised object discovery and segmentation.
- Score: 28.502280038100167
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The ability to decompose scenes into their object components is a desired
property for autonomous agents, allowing them to reason and act in their
surroundings. Recently, different methods have been proposed to learn
object-centric representations from data in an unsupervised manner. These
methods often rely on latent representations learned by deep neural networks,
hence requiring high computational costs and large amounts of curated data.
Such models are also difficult to interpret. To address these challenges, we
propose the Phase-Correlation Decomposition Network (PCDNet), a novel model
that decomposes a scene into its object components, which are represented as
transformed versions of a set of learned object prototypes. The core building
block in PCDNet is the Phase-Correlation Cell (PC Cell), which exploits the
frequency-domain representation of the images in order to estimate the
transformation between an object prototype and its transformed version in the
image. In our experiments, we show how PCDNet outperforms state-of-the-art
methods for unsupervised object discovery and segmentation on simple benchmark
datasets and on more challenging data, while using a small number of learnable
parameters and being fully interpretable.
Related papers
- PEEKABOO: Hiding parts of an image for unsupervised object localization [7.161489957025654]
Localizing objects in an unsupervised manner poses significant challenges due to the absence of key visual information.
We propose a single-stage learning framework, dubbed PEEKABOO, for unsupervised object localization.
The key idea is to selectively hide parts of an image and leverage the remaining image information to infer the location of objects without explicit supervision.
arXiv Detail & Related papers (2024-07-24T20:35:20Z) - Analyzing Local Representations of Self-supervised Vision Transformers [34.56680159632432]
We present a comparative analysis of various self-supervised Vision Transformers (ViTs)
Inspired by large language models, we examine the abilities of ViTs to perform various computer vision tasks with little to no fine-tuning.
arXiv Detail & Related papers (2023-12-31T11:38:50Z) - High-Resolution Vision Transformers for Pixel-Level Identification of
Structural Components and Damage [1.8923948104852863]
We develop a semantic segmentation network based on vision transformers and Laplacian pyramids scaling networks.
The proposed framework has been evaluated through comprehensive experiments on a dataset of bridge inspection report images.
arXiv Detail & Related papers (2023-08-06T03:34:25Z) - Unsupervised Part Discovery from Contrastive Reconstruction [90.88501867321573]
The goal of self-supervised visual representation learning is to learn strong, transferable image representations.
We propose an unsupervised approach to object part discovery and segmentation.
Our method yields semantic parts consistent across fine-grained but visually distinct categories.
arXiv Detail & Related papers (2021-11-11T17:59:42Z) - PnP-DETR: Towards Efficient Visual Analysis with Transformers [146.55679348493587]
Recently, DETR pioneered the solution vision tasks with transformers, it directly translates the image feature map into the object result.
Recent transformer-based image recognition model andTT show consistent efficiency gain.
arXiv Detail & Related papers (2021-09-15T01:10:30Z) - CoSformer: Detecting Co-Salient Object with Transformers [2.3148470932285665]
Co-Salient Object Detection (CoSOD) aims at simulating the human visual system to discover the common and salient objects from a group of relevant images.
We propose the Co-Salient Object Detection Transformer (CoSformer) network to capture both salient and common visual patterns from multiple images.
arXiv Detail & Related papers (2021-04-30T02:39:12Z) - Self-supervised Segmentation via Background Inpainting [96.10971980098196]
We introduce a self-supervised detection and segmentation approach that can work with single images captured by a potentially moving camera.
We exploit a self-supervised loss function that we exploit to train a proposal-based segmentation network.
We apply our method to human detection and segmentation in images that visually depart from those of standard benchmarks and outperform existing self-supervised methods.
arXiv Detail & Related papers (2020-11-11T08:34:40Z) - DyStaB: Unsupervised Object Segmentation via Dynamic-Static
Bootstrapping [72.84991726271024]
We describe an unsupervised method to detect and segment portions of images of live scenes that are seen moving as a coherent whole.
Our method first partitions the motion field by minimizing the mutual information between segments.
It uses the segments to learn object models that can be used for detection in a static image.
arXiv Detail & Related papers (2020-08-16T22:05:13Z) - Compositional Convolutional Neural Networks: A Robust and Interpretable
Model for Object Recognition under Occlusion [21.737411464598797]
We show that black-box deep convolutional neural networks (DCNNs) have only limited robustness to partial occlusion.
We overcome these limitations by unifying DCNNs with part-based models into Compositional Convolutional Neural Networks (CompositionalNets)
Our experiments show that CompositionalNets improve by a large margin over their non-compositional counterparts at classifying and detecting partially occluded objects.
arXiv Detail & Related papers (2020-06-28T08:18:19Z) - End-to-End Object Detection with Transformers [88.06357745922716]
We present a new method that views object detection as a direct set prediction problem.
Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed components.
The main ingredients of the new framework, called DEtection TRansformer or DETR, are a set-based global loss.
arXiv Detail & Related papers (2020-05-26T17:06:38Z) - Object-Centric Image Generation from Layouts [93.10217725729468]
We develop a layout-to-image-generation method to generate complex scenes with multiple objects.
Our method learns representations of the spatial relationships between objects in the scene, which lead to our model's improved layout-fidelity.
We introduce SceneFID, an object-centric adaptation of the popular Fr'echet Inception Distance metric, that is better suited for multi-object images.
arXiv Detail & Related papers (2020-03-16T21:40:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.