Halluci-Net: Scene Completion by Exploiting Object Co-occurrence
Relationships
- URL: http://arxiv.org/abs/2004.08614v2
- Date: Fri, 21 May 2021 03:04:53 GMT
- Title: Halluci-Net: Scene Completion by Exploiting Object Co-occurrence
Relationships
- Authors: Kuldeep Kulkarni, Tejas Gokhale, Rajhans Singh, Pavan Turaga, Aswin
Sankaranarayanan
- Abstract summary: We propose a two-stage deep network based method, called Halluci-Net', that learns co-occurence relationships between objects in scenes, and then exploits these relationships to produce a dense and complete labelmap.
The generated dense labelmap can then be used as input by state-of-the-art image synthesis techniques like pix2pixHD to obtain the final image.
- Score: 10.321117790264198
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, there has been substantial progress in image synthesis from
semantic labelmaps. However, methods used for this task assume the availability
of complete and unambiguous labelmaps, with instance boundaries of objects, and
class labels for each pixel. This reliance on heavily annotated inputs
restricts the application of image synthesis techniques to real-world
applications, especially under uncertainty due to weather, occlusion, or noise.
On the other hand, algorithms that can synthesize images from sparse labelmaps
or sketches are highly desirable as tools that can guide content creators and
artists to quickly generate scenes by simply specifying locations of a few
objects. In this paper, we address the problem of complex scene completion from
sparse labelmaps. Under this setting, very few details about the scene (30\% of
object instances) are available as input for image synthesis. We propose a
two-stage deep network based method, called `Halluci-Net', that learns
co-occurence relationships between objects in scenes, and then exploits these
relationships to produce a dense and complete labelmap. The generated dense
labelmap can then be used as input by state-of-the-art image synthesis
techniques like pix2pixHD to obtain the final image. The proposed method is
evaluated on the Cityscapes dataset and it outperforms two baselines methods on
performance metrics like Fr\'echet Inception Distance (FID), semantic
segmentation accuracy, and similarity in object co-occurrences. We also show
qualitative results on a subset of ADE20K dataset that contains bedroom images.
Related papers
- ReFit: A Framework for Refinement of Weakly Supervised Semantic
Segmentation using Object Border Fitting for Medical Images [4.945138408504987]
Weakly Supervised Semantic (WSSS) relying only on image-level supervision is a promising approach to deal with the need for networks.
We propose our novel ReFit framework, which deploys state-of-the-art class activation maps combined with various post-processing techniques.
By applying our method to WSSS predictions, we achieved up to 10% improvement over the current state-of-the-art WSSS methods for medical imaging.
arXiv Detail & Related papers (2023-03-14T12:46:52Z) - SceneComposer: Any-Level Semantic Image Synthesis [80.55876413285587]
We propose a new framework for conditional image synthesis from semantic layouts of any precision levels.
The framework naturally reduces to text-to-image (T2I) at the lowest level with no shape information, and it becomes segmentation-to-image (S2I) at the highest level.
We introduce several novel techniques to address the challenges coming with this new setup.
arXiv Detail & Related papers (2022-11-21T18:59:05Z) - Scrape, Cut, Paste and Learn: Automated Dataset Generation Applied to
Parcel Logistics [58.720142291102135]
We present a fully automated pipeline to generate a synthetic dataset for instance segmentation in four steps.
We first scrape images for the objects of interest from popular image search engines.
We compare three different methods for image selection: Object-agnostic pre-processing, manual image selection and CNN-based image selection.
arXiv Detail & Related papers (2022-10-18T12:49:04Z) - Automatic dataset generation for specific object detection [6.346581421948067]
We present a method to synthesize object-in-scene images, which can preserve the objects' detailed features without bringing irrelevant information.
Our result shows that in the synthesized image, the boundaries of objects blend very well with the background.
arXiv Detail & Related papers (2022-07-16T07:44:33Z) - Knowledge Mining with Scene Text for Fine-Grained Recognition [53.74297368412834]
We propose an end-to-end trainable network that mines implicit contextual knowledge behind scene text image.
We employ KnowBert to retrieve relevant knowledge for semantic representation and combine it with image features for fine-grained classification.
Our method outperforms the state-of-the-art by 3.72% mAP and 5.39% mAP, respectively.
arXiv Detail & Related papers (2022-03-27T05:54:00Z) - Language-driven Semantic Segmentation [88.21498323896475]
We present LSeg, a novel model for language-driven semantic image segmentation.
We use a text encoder to compute embeddings of descriptive input labels.
The encoder is trained with a contrastive objective to align pixel embeddings to the text embedding of the corresponding semantic class.
arXiv Detail & Related papers (2022-01-10T18:59:10Z) - Semantic Segmentation with Generative Models: Semi-Supervised Learning
and Strong Out-of-Domain Generalization [112.68171734288237]
We propose a novel framework for discriminative pixel-level tasks using a generative model of both images and labels.
We learn a generative adversarial network that captures the joint image-label distribution and is trained efficiently using a large set of unlabeled images.
We demonstrate strong in-domain performance compared to several baselines, and are the first to showcase extreme out-of-domain generalization.
arXiv Detail & Related papers (2021-04-12T21:41:25Z) - Deriving Visual Semantics from Spatial Context: An Adaptation of LSA and
Word2Vec to generate Object and Scene Embeddings from Images [0.0]
We develop two approaches for learning object and scene embeddings from annotated images.
In the first approach, we generate embeddings from object co-occurrences in whole images, one for objects and one for scenes.
In the second approach, rather than analyzing whole images of scenes, we focus on co-occurrences of objects within subregions of an image.
arXiv Detail & Related papers (2020-09-20T08:26:38Z) - Improving Semantic Segmentation via Decoupled Body and Edge Supervision [89.57847958016981]
Existing semantic segmentation approaches either aim to improve the object's inner consistency by modeling the global context, or refine objects detail along their boundaries by multi-scale feature fusion.
In this paper, a new paradigm for semantic segmentation is proposed.
Our insight is that appealing performance of semantic segmentation requires textitexplicitly modeling the object textitbody and textitedge, which correspond to the high and low frequency of the image.
We show that the proposed framework with various baselines or backbone networks leads to better object inner consistency and object boundaries.
arXiv Detail & Related papers (2020-07-20T12:11:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.