Related papers: Halluci-Net: Scene Completion by Exploiting Object Co-occurrence Relationships

Halluci-Net: Scene Completion by Exploiting Object Co-occurrence Relationships

URL: http://arxiv.org/abs/2004.08614v2
Date: Fri, 21 May 2021 03:04:53 GMT
Title: Halluci-Net: Scene Completion by Exploiting Object Co-occurrence Relationships
Authors: Kuldeep Kulkarni, Tejas Gokhale, Rajhans Singh, Pavan Turaga, Aswin Sankaranarayanan
Abstract summary: We propose a two-stage deep network based method, called Halluci-Net', that learns co-occurence relationships between objects in scenes, and then exploits these relationships to produce a dense and complete labelmap. The generated dense labelmap can then be used as input by state-of-the-art image synthesis techniques like pix2pixHD to obtain the final image.
Score: 10.321117790264198
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recently, there has been substantial progress in image synthesis from semantic labelmaps. However, methods used for this task assume the availability of complete and unambiguous labelmaps, with instance boundaries of objects, and class labels for each pixel. This reliance on heavily annotated inputs restricts the application of image synthesis techniques to real-world applications, especially under uncertainty due to weather, occlusion, or noise. On the other hand, algorithms that can synthesize images from sparse labelmaps or sketches are highly desirable as tools that can guide content creators and artists to quickly generate scenes by simply specifying locations of a few objects. In this paper, we address the problem of complex scene completion from sparse labelmaps. Under this setting, very few details about the scene (30\% of object instances) are available as input for image synthesis. We propose a two-stage deep network based method, called `Halluci-Net', that learns co-occurence relationships between objects in scenes, and then exploits these relationships to produce a dense and complete labelmap. The generated dense labelmap can then be used as input by state-of-the-art image synthesis techniques like pix2pixHD to obtain the final image. The proposed method is evaluated on the Cityscapes dataset and it outperforms two baselines methods on performance metrics like Fr\'echet Inception Distance (FID), semantic segmentation accuracy, and similarity in object co-occurrences. We also show qualitative results on a subset of ADE20K dataset that contains bedroom images.

Related papers

ReFit: A Framework for Refinement of Weakly Supervised Semantic Segmentation using Object Border Fitting for Medical Images [4.945138408504987]
Weakly Supervised Semantic (WSSS) relying only on image-level supervision is a promising approach to deal with the need for networks. We propose our novel ReFit framework, which deploys state-of-the-art class activation maps combined with various post-processing techniques. By applying our method to WSSS predictions, we achieved up to 10% improvement over the current state-of-the-art WSSS methods for medical imaging.
arXiv Detail & Related papers (2023-03-14T12:46:52Z)
SceneComposer: Any-Level Semantic Image Synthesis [80.55876413285587]
We propose a new framework for conditional image synthesis from semantic layouts of any precision levels. The framework naturally reduces to text-to-image (T2I) at the lowest level with no shape information, and it becomes segmentation-to-image (S2I) at the highest level. We introduce several novel techniques to address the challenges coming with this new setup.
arXiv Detail & Related papers (2022-11-21T18:59:05Z)
Scrape, Cut, Paste and Learn: Automated Dataset Generation Applied to Parcel Logistics [58.720142291102135]
We present a fully automated pipeline to generate a synthetic dataset for instance segmentation in four steps. We first scrape images for the objects of interest from popular image search engines. We compare three different methods for image selection: Object-agnostic pre-processing, manual image selection and CNN-based image selection.
arXiv Detail & Related papers (2022-10-18T12:49:04Z)
Automatic dataset generation for specific object detection [6.346581421948067]
We present a method to synthesize object-in-scene images, which can preserve the objects' detailed features without bringing irrelevant information. Our result shows that in the synthesized image, the boundaries of objects blend very well with the background.
arXiv Detail & Related papers (2022-07-16T07:44:33Z)
Knowledge Mining with Scene Text for Fine-Grained Recognition [53.74297368412834]
We propose an end-to-end trainable network that mines implicit contextual knowledge behind scene text image. We employ KnowBert to retrieve relevant knowledge for semantic representation and combine it with image features for fine-grained classification. Our method outperforms the state-of-the-art by 3.72% mAP and 5.39% mAP, respectively.
arXiv Detail & Related papers (2022-03-27T05:54:00Z)
Language-driven Semantic Segmentation [88.21498323896475]
We present LSeg, a novel model for language-driven semantic image segmentation. We use a text encoder to compute embeddings of descriptive input labels. The encoder is trained with a contrastive objective to align pixel embeddings to the text embedding of the corresponding semantic class.
arXiv Detail & Related papers (2022-01-10T18:59:10Z)
DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort [117.41383937100751]
Current deep networks are extremely data-hungry, benefiting from training on large-scale datasets. We show how the GAN latent code can be decoded to produce a semantic segmentation of the image. These generated datasets can then be used for training any computer vision architecture just as real datasets are.
arXiv Detail & Related papers (2021-04-13T20:08:29Z)
Semantic Segmentation with Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalization [112.68171734288237]
We propose a novel framework for discriminative pixel-level tasks using a generative model of both images and labels. We learn a generative adversarial network that captures the joint image-label distribution and is trained efficiently using a large set of unlabeled images. We demonstrate strong in-domain performance compared to several baselines, and are the first to showcase extreme out-of-domain generalization.
arXiv Detail & Related papers (2021-04-12T21:41:25Z)
Deriving Visual Semantics from Spatial Context: An Adaptation of LSA and Word2Vec to generate Object and Scene Embeddings from Images [0.0]
We develop two approaches for learning object and scene embeddings from annotated images. In the first approach, we generate embeddings from object co-occurrences in whole images, one for objects and one for scenes. In the second approach, rather than analyzing whole images of scenes, we focus on co-occurrences of objects within subregions of an image.
arXiv Detail & Related papers (2020-09-20T08:26:38Z)
Improving Semantic Segmentation via Decoupled Body and Edge Supervision [89.57847958016981]
Existing semantic segmentation approaches either aim to improve the object's inner consistency by modeling the global context, or refine objects detail along their boundaries by multi-scale feature fusion. In this paper, a new paradigm for semantic segmentation is proposed. Our insight is that appealing performance of semantic segmentation requires textitexplicitly modeling the object textitbody and textitedge, which correspond to the high and low frequency of the image. We show that the proposed framework with various baselines or backbone networks leads to better object inner consistency and object boundaries.
arXiv Detail & Related papers (2020-07-20T12:11:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.