Context-based Image Segment Labeling (CBISL)
- URL: http://arxiv.org/abs/2011.00784v1
- Date: Mon, 2 Nov 2020 07:26:55 GMT
- Title: Context-based Image Segment Labeling (CBISL)
- Authors: Tobias Schlagenhauf, Yefeng Xia, J\"urgen Fleischer
- Abstract summary: This paper aims to recover semantic image features (objects and positions) in images.
We demonstrate a new approach referred to as quadro-directional PixelCNN to recover missing objects.
The results suggest that our four-directional model outperforms one-directional models.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Working with images, one often faces problems with incomplete or unclear
information. Image inpainting can be used to restore missing image regions but
focuses, however, on low-level image features such as pixel intensity, pixel
gradient orientation, and color. This paper aims to recover semantic image
features (objects and positions) in images. Based on published gated PixelCNNs,
we demonstrate a new approach referred to as quadro-directional PixelCNN to
recover missing objects and return probable positions for objects based on the
context. We call this approach context-based image segment labeling (CBISL).
The results suggest that our four-directional model outperforms one-directional
models (gated PixelCNN) and returns a human-comparable performance.
Related papers
- Depth-aware Panoptic Segmentation [1.4170154234094008]
We present a novel CNN-based method for panoptic segmentation.
We propose a new depth-aware dice loss term which penalises the assignment of pixels to the same thing instance.
Experiments carried out on the Cityscapes dataset show that the proposed method reduces the number of objects that are erroneously merged into one thing instance.
arXiv Detail & Related papers (2024-03-21T08:06:49Z) - Context Does Matter: End-to-end Panoptic Narrative Grounding with
Deformable Attention Refined Matching Network [25.511804582983977]
Panoramic Narrative Grounding (PNG) aims to segment visual objects in images based on dense narrative captions.
We propose a novel learning framework called Deformable Attention Refined Matching Network (DRMN)
DRMN iteratively re-encodes pixels with the deformable attention network after updating the feature representation of the top-$k$ most similar pixels.
arXiv Detail & Related papers (2023-10-25T13:12:39Z) - Pixel-Inconsistency Modeling for Image Manipulation Localization [59.968362815126326]
Digital image forensics plays a crucial role in image authentication and manipulation localization.
This paper presents a generalized and robust manipulation localization model through the analysis of pixel inconsistency artifacts.
Experiments show that our method successfully extracts inherent pixel-inconsistency forgery fingerprints.
arXiv Detail & Related papers (2023-09-30T02:54:51Z) - SPColor: Semantic Prior Guided Exemplar-based Image Colorization [14.191819767895867]
We propose SPColor, a semantic prior guided-based image colorization framework.
SPColor first coarsely classifies pixels of the reference and target images to several pseudo-classes under the guidance of semantic prior.
Our model outperforms recent state-of-the-art methods both quantitatively and qualitatively on public dataset.
arXiv Detail & Related papers (2023-04-13T04:21:45Z) - Self-Supervised Video Object Segmentation via Cutout Prediction and
Tagging [117.73967303377381]
We propose a novel self-supervised Video Object (VOS) approach that strives to achieve better object-background discriminability.
Our approach is based on a discriminative learning loss formulation that takes into account both object and background information.
Our proposed approach, CT-VOS, achieves state-of-the-art results on two challenging benchmarks: DAVIS-2017 and Youtube-VOS.
arXiv Detail & Related papers (2022-04-22T17:53:27Z) - Aerial Scene Parsing: From Tile-level Scene Classification to Pixel-wise
Semantic Labeling [48.30060717413166]
Given an aerial image, aerial scene parsing (ASP) targets to interpret the semantic structure of the image content by assigning a semantic label to every pixel of the image.
We present a large-scale scene classification dataset that contains one million aerial images termed Million-AID.
We also report benchmarking experiments using classical convolutional neural networks (CNNs) to achieve pixel-wise semantic labeling.
arXiv Detail & Related papers (2022-01-06T07:40:47Z) - Maximize the Exploration of Congeneric Semantics for Weakly Supervised
Semantic Segmentation [27.155133686127474]
We construct a graph neural network (P-GNN) based on the self-detected patches from different images that contain the same class labels.
We conduct experiments on the popular PASCAL VOC 2012 benchmarks, and our model yields state-of-the-art performance.
arXiv Detail & Related papers (2021-10-08T08:59:16Z) - Image Restoration by Deep Projected GSURE [115.57142046076164]
Ill-posed inverse problems appear in many image processing applications, such as deblurring and super-resolution.
We propose a new image restoration framework that is based on minimizing a loss function that includes a "projected-version" of the Generalized SteinUnbiased Risk Estimator (GSURE) and parameterization of the latent image by a CNN.
arXiv Detail & Related papers (2021-02-04T08:52:46Z) - The pursuit of beauty: Converting image labels to meaningful vectors [2.741266294612776]
This paper introduces a method, called Occlusion-based Latent Representations (OLR), for converting image labels to meaningful representations that capture a significant amount of data semantics.
Besides being informational rich, these representations compose a disentangled low-dimensional latent space where each image label is encoded into a separate vector.
We evaluate the quality of these representations in a series of experiments whose results suggest that the proposed model can capture data concepts and discover data interrelations.
arXiv Detail & Related papers (2020-08-03T06:33:11Z) - Steering Self-Supervised Feature Learning Beyond Local Pixel Statistics [60.92229707497999]
We introduce a novel principle for self-supervised feature learning based on the discrimination of specific transformations of an image.
We demonstrate experimentally that learning to discriminate transformations such as LCI, image warping and rotations, yields features with state of the art generalization capabilities.
arXiv Detail & Related papers (2020-04-05T22:09:08Z) - Expressing Objects just like Words: Recurrent Visual Embedding for
Image-Text Matching [102.62343739435289]
Existing image-text matching approaches infer the similarity of an image-text pair by capturing and aggregating the affinities between the text and each independent object of the image.
We propose a Dual Path Recurrent Neural Network (DP-RNN) which processes images and sentences symmetrically by recurrent neural networks (RNN)
Our model achieves the state-of-the-art performance on Flickr30K dataset and competitive performance on MS-COCO dataset.
arXiv Detail & Related papers (2020-02-20T00:51:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.