Interpreting Face Inference Models using Hierarchical Network Dissection
- URL: http://arxiv.org/abs/2108.10360v1
- Date: Mon, 23 Aug 2021 18:52:47 GMT
- Title: Interpreting Face Inference Models using Hierarchical Network Dissection
- Authors: Divyang Teotia, Agata Lapedriza, Sarah Ostadabbas
- Abstract summary: Hierarchical Network Dissection is a pipeline to interpret the internal representation of face-centric inference models.
Our pipeline is inspired by Network Dissection, a popular interpretability model for object-centric and scene-centric models.
- Score: 10.852613235927958
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This paper presents Hierarchical Network Dissection, a general pipeline to
interpret the internal representation of face-centric inference models. Using a
probabilistic formulation, Hierarchical Network Dissection pairs units of the
model with concepts in our "Face Dictionary" (a collection of facial concepts
with corresponding sample images). Our pipeline is inspired by Network
Dissection, a popular interpretability model for object-centric and
scene-centric models. However, our formulation allows to deal with two
important challenges of face-centric models that Network Dissection cannot
address: (1) spacial overlap of concepts: there are different facial concepts
that simultaneously occur in the same region of the image, like "nose" (facial
part) and "pointy nose" (facial attribute); and (2) global concepts: there are
units with affinity to concepts that do not refer to specific locations of the
face (e.g. apparent age). To validate the effectiveness of our unit-concept
pairing formulation, we first conduct controlled experiments on biased data.
These experiments illustrate how Hierarchical Network Dissection can be used to
discover bias in the training data. Then, we dissect different face-centric
inference models trained on widely-used facial datasets. The results show
models trained for different tasks have different internal representations.
Furthermore, the interpretability results reveal some biases in the training
data and some interesting characteristics of the face-centric inference tasks.
Related papers
- Representational Similarity via Interpretable Visual Concepts [27.72186215265676]
We introduce an interpretable representational similarity method to compare two networks.
We show that some aspects of model differences can be attributed to unique concepts discovered by one model that are not well represented in the other.
arXiv Detail & Related papers (2025-03-19T21:21:45Z) - Closely Interactive Human Reconstruction with Proxemics and Physics-Guided Adaption [64.07607726562841]
Existing multi-person human reconstruction approaches mainly focus on recovering accurate poses or avoiding penetration.
In this work, we tackle the task of reconstructing closely interactive humans from a monocular video.
We propose to leverage knowledge from proxemic behavior and physics to compensate the lack of visual information.
arXiv Detail & Related papers (2024-04-17T11:55:45Z) - Appearance Debiased Gaze Estimation via Stochastic Subject-Wise
Adversarial Learning [33.55397868171977]
Appearance-based gaze estimation has been attracting attention in computer vision, and remarkable improvements have been achieved using various deep learning techniques.
We propose a novel framework: subject-wise gaZE learning (SAZE), which trains a network to generalize the appearance of subjects.
Our experimental results verify the robustness of the method in that it yields state-of-the-art performance, achieving 3.89 and 4.42 on the MPIIGaze and EyeDiap datasets, respectively.
arXiv Detail & Related papers (2024-01-25T00:23:21Z) - Evaluating alignment between humans and neural network representations in image-based learning tasks [5.657101730705275]
We tested how well the representations of $86$ pretrained neural network models mapped to human learning trajectories.
We found that while training dataset size was a core determinant of alignment with human choices, contrastive training with multi-modal data (text and imagery) was a common feature of currently publicly available models that predicted human generalisation.
In conclusion, pretrained neural networks can serve to extract representations for cognitive models, as they appear to capture some fundamental aspects of cognition that are transferable across tasks.
arXiv Detail & Related papers (2023-06-15T08:18:29Z) - FaceTopoNet: Facial Expression Recognition using Face Topology Learning [23.139108533273777]
We propose an end-to-end deep model for facial expression recognition, which is capable of learning an effective tree topology of the face.
Our model then traverses the learned tree to generate a sequence, which is then used to form an embedding to feed a sequential learner.
We perform extensive experiments on four large-scale in-the-wild facial expression datasets to evaluate our approach.
arXiv Detail & Related papers (2022-09-13T22:02:54Z) - Drawing out of Distribution with Neuro-Symbolic Generative Models [49.79371715591122]
Drawing out of Distribution is a neuro-symbolic generative model of stroke-based drawing.
DooD operates directly on images, requires no supervision or expensive test-time inference.
We evaluate DooD on its ability to generalise across both data and tasks.
arXiv Detail & Related papers (2022-06-03T21:40:22Z) - Unsupervised learning of features and object boundaries from local
prediction [0.0]
We introduce a layer of feature maps with a pairwise Markov random field model in which each factor is paired with an additional binary variable, which switches the factor on or off.
We can learn both the features and the parameters of the Markov random field factors from images without further supervision signals.
We show that computing predictions across space aids both segmentation and feature learning, and models trained to optimize these predictions show similarities to the human visual system.
arXiv Detail & Related papers (2022-05-27T18:54:10Z) - Expression-preserving face frontalization improves visually assisted
speech processing [35.647888055229956]
The main contribution of this paper is a frontalization methodology that preserves non-rigid facial deformations.
We show that the method, when incorporated into deep learning pipelines, improves word recognition and speech intelligibilty scores by a considerable margin.
arXiv Detail & Related papers (2022-04-06T13:22:24Z) - Multi-Branch Deep Radial Basis Function Networks for Facial Emotion
Recognition [80.35852245488043]
We propose a CNN based architecture enhanced with multiple branches formed by radial basis function (RBF) units.
RBF units capture local patterns shared by similar instances using an intermediate representation.
We show it is the incorporation of local information what makes the proposed model competitive.
arXiv Detail & Related papers (2021-09-07T21:05:56Z) - The FaceChannel: A Fast & Furious Deep Neural Network for Facial
Expression Recognition [71.24825724518847]
Current state-of-the-art models for automatic Facial Expression Recognition (FER) are based on very deep neural networks that are effective but rather expensive to train.
We formalize the FaceChannel, a light-weight neural network that has much fewer parameters than common deep neural networks.
We demonstrate how our model achieves a comparable, if not better, performance to the current state-of-the-art in FER.
arXiv Detail & Related papers (2020-09-15T09:25:37Z) - Understanding the Role of Individual Units in a Deep Neural Network [85.23117441162772]
We present an analytic framework to systematically identify hidden units within image classification and image generation networks.
First, we analyze a convolutional neural network (CNN) trained on scene classification and discover units that match a diverse set of object concepts.
Second, we use a similar analytic method to analyze a generative adversarial network (GAN) model trained to generate scenes.
arXiv Detail & Related papers (2020-09-10T17:59:10Z) - InterFaceGAN: Interpreting the Disentangled Face Representation Learned
by GANs [73.27299786083424]
We propose a framework called InterFaceGAN to interpret the disentangled face representation learned by state-of-the-art GAN models.
We first find that GANs learn various semantics in some linear subspaces of the latent space.
We then conduct a detailed study on the correlation between different semantics and manage to better disentangle them via subspace projection.
arXiv Detail & Related papers (2020-05-18T18:01:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.