Adaptive Graph Representation Learning and Reasoning for Face Parsing
- URL: http://arxiv.org/abs/2101.07034v1
- Date: Mon, 18 Jan 2021 12:17:40 GMT
- Title: Adaptive Graph Representation Learning and Reasoning for Face Parsing
- Authors: Gusi Te, Wei Hu, Yinglu Liu, Hailin Shi, Tao Mei
- Abstract summary: Face parsing infers a pixel-wise label to each facial component.
Component-wise relationship is a critical clue in discriminating ambiguous pixels in facial area.
We propose adaptive graph representation learning and reasoning over facial components.
- Score: 55.086151726427104
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Face parsing infers a pixel-wise label to each facial component, which has
drawn much attention recently. Previous methods have shown their success in
face parsing, which however overlook the correlation among facial components.
As a matter of fact, the component-wise relationship is a critical clue in
discriminating ambiguous pixels in facial area. To address this issue, we
propose adaptive graph representation learning and reasoning over facial
components, aiming to learn representative vertices that describe each
component, exploit the component-wise relationship and thereby produce accurate
parsing results against ambiguity. In particular, we devise an adaptive and
differentiable graph abstraction method to represent the components on a graph
via pixel-to-vertex projection under the initial condition of a predicted
parsing map, where pixel features within a certain facial region are aggregated
onto a vertex. Further, we explicitly incorporate the image edge as a prior in
the model, which helps to discriminate edge and non-edge pixels during the
projection, thus leading to refined parsing results along the edges. Then, our
model learns and reasons over the relations among components by propagating
information across vertices on the graph. Finally, the refined vertex features
are projected back to pixel grids for the prediction of the final parsing map.
To train our model, we propose a discriminative loss to penalize small
distances between vertices in the feature space, which leads to distinct
vertices with strong semantics. Experimental results show the superior
performance of the proposed model on multiple face parsing datasets, along with
the validation on the human parsing task to demonstrate the generalizability of
our model.
Related papers
- Entropy Neural Estimation for Graph Contrastive Learning [9.032721248598088]
Contrastive learning on graphs aims at extracting distinguishable high-level representations of nodes.
We propose a simple yet effective subset sampling strategy to contrast pairwise representations between views of a dataset.
We conduct extensive experiments on seven graph benchmarks, and the proposed approach achieves competitive performance.
arXiv Detail & Related papers (2023-07-26T03:55:08Z) - Pixel Relationships-based Regularizer for Retinal Vessel Image
Segmentation [4.3251090426112695]
This study presents regularizers to give the pixel neighbor relationship information to the learning process.
Experiments show that our scheme successfully captures pixel neighbor relations and improves the performance of the convolutional neural network.
arXiv Detail & Related papers (2022-12-28T07:35:20Z) - DisPositioNet: Disentangled Pose and Identity in Semantic Image
Manipulation [83.51882381294357]
DisPositioNet is a model that learns a disentangled representation for each object for the task of image manipulation using scene graphs.
Our framework enables the disentanglement of the variational latent embeddings as well as the feature representation in the graph.
arXiv Detail & Related papers (2022-11-10T11:47:37Z) - Facial Geometric Detail Recovery via Implicit Representation [147.07961322377685]
We present a robust texture-guided geometric detail recovery approach using only a single in-the-wild facial image.
Our method combines high-quality texture completion with the powerful expressiveness of implicit surfaces.
Our method not only recovers accurate facial details but also decomposes normals, albedos, and shading parts in a self-supervised way.
arXiv Detail & Related papers (2022-03-18T01:42:59Z) - Learning to Generate Scene Graph from Natural Language Supervision [52.18175340725455]
We propose one of the first methods that learn from image-sentence pairs to extract a graphical representation of localized objects and their relationships within an image, known as scene graph.
We leverage an off-the-shelf object detector to identify and localize object instances, match labels of detected regions to concepts parsed from captions, and thus create "pseudo" labels for learning scene graph.
arXiv Detail & Related papers (2021-09-06T03:38:52Z) - Learning to Disambiguate Strongly Interacting Hands via Probabilistic
Per-pixel Part Segmentation [84.28064034301445]
Self-similarity, and the resulting ambiguities in assigning pixel observations to the respective hands, is a major cause of the final 3D pose error.
We propose DIGIT, a novel method for estimating the 3D poses of two interacting hands from a single monocular image.
We experimentally show that the proposed approach achieves new state-of-the-art performance on the InterHand2.6M dataset.
arXiv Detail & Related papers (2021-07-01T13:28:02Z) - Edge-aware Graph Representation Learning and Reasoning for Face Parsing [61.5045850197694]
Face parsing infers a pixel-wise label to each facial component, which has drawn much attention recently.
Previous methods have shown their efficiency in face parsing, which however overlook the correlation among different face regions.
We propose to model and reason the region-wise relations by learning graph representations.
arXiv Detail & Related papers (2020-07-22T07:46:34Z) - JGR-P2O: Joint Graph Reasoning based Pixel-to-Offset Prediction Network
for 3D Hand Pose Estimation from a Single Depth Image [28.753759115780515]
State-of-the-art single depth image-based 3D hand pose estimation methods are based on dense predictions.
A novel pixel-wise prediction-based method is proposed to address the above issues.
The proposed model is implemented with an efficient 2D fully convolutional network backbone and has only about 1.4M parameters.
arXiv Detail & Related papers (2020-07-09T08:57:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.