Unique Faces Recognition in Videos
- URL: http://arxiv.org/abs/2006.05713v1
- Date: Wed, 10 Jun 2020 08:08:26 GMT
- Title: Unique Faces Recognition in Videos
- Authors: Jiahao Huo and Terence L van Zyl
- Abstract summary: This paper tackles face recognition in videos employing metric learning methods and similarity ranking models.
The dataset used was the YouTube Face Database designed for investigating the problem of face recognition in videos.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper tackles face recognition in videos employing metric learning
methods and similarity ranking models. The paper compares the use of the
Siamese network with contrastive loss and Triplet Network with triplet loss
implementing the following architectures: Google/Inception architecture, 3D
Convolutional Network (C3D), and a 2-D Long short-term memory (LSTM) Recurrent
Neural Network. We make use of still images and sequences from videos for
training the networks and compare the performances implementing the above
architectures. The dataset used was the YouTube Face Database designed for
investigating the problem of face recognition in videos. The contribution of
this paper is two-fold: to begin, the experiments have established 3-D
Convolutional networks and 2-D LSTMs with the contrastive loss on image
sequences do not outperform Google/Inception architecture with contrastive loss
in top $n$ rank face retrievals with still images. However, the 3-D Convolution
networks and 2-D LSTM with triplet Loss outperform the Google/Inception with
triplet loss in top $n$ rank face retrievals on the dataset; second, a Support
Vector Machine (SVM) was used in conjunction with the CNNs' learned feature
representations for facial identification. The results show that feature
representation learned with triplet loss is significantly better for n-shot
facial identification compared to contrastive loss. The most useful feature
representations for facial identification are from the 2-D LSTM with triplet
loss. The experiments show that learning spatio-temporal features from video
sequences is beneficial for facial recognition in videos.
Related papers
- SeMLaPS: Real-time Semantic Mapping with Latent Prior Networks and
Quasi-Planar Segmentation [53.83313235792596]
We present a new methodology for real-time semantic mapping from RGB-D sequences.
It combines a 2D neural network and a 3D network based on a SLAM system with 3D occupancy mapping.
Our system achieves state-of-the-art semantic mapping quality within 2D-3D networks-based systems.
arXiv Detail & Related papers (2023-06-28T22:36:44Z) - Unlocking Masked Autoencoders as Loss Function for Image and Video
Restoration [19.561055022474786]
We study the potential of loss and raise our belief learned loss function empowers the learning capability of neural networks for image and video restoration''
We investigate the efficacy of our belief from three perspectives: 1) from task-customized MAE to native MAE, 2) from image task to video task, and 3) from transformer structure to convolution neural network structure.
arXiv Detail & Related papers (2023-03-29T02:41:08Z) - Graphics Capsule: Learning Hierarchical 3D Face Representations from 2D
Images [82.5266467869448]
We propose an Inverse Graphics Capsule Network (IGC-Net) to learn the hierarchical 3D face representations from large-scale unlabeled images.
IGC-Net first decomposes the objects into a set of semantic-consistent part-level descriptions and then assembles them into object-level descriptions to build the hierarchy.
arXiv Detail & Related papers (2023-03-20T06:32:55Z) - RangeViT: Towards Vision Transformers for 3D Semantic Segmentation in
Autonomous Driving [80.14669385741202]
Vision transformers (ViTs) have achieved state-of-the-art results in many image-based benchmarks.
ViTs are notoriously hard to train and require a lot of training data to learn powerful representations.
We show that our method, called RangeViT, outperforms existing projection-based methods on nuScenes and Semantic KITTI.
arXiv Detail & Related papers (2023-01-24T18:50:48Z) - Face Recognition Using $Sf_{3}CNN$ With Higher Feature Discrimination [14.26473757011463]
We propose a framework called $Sf_3CNN$ for face recognition in videos.
The framework uses 3-dimensional Residual Network (3D Resnet) and A-Softmax loss for face recognition in videos.
It gives an increased accuracy of 99.10% on CVBL video database in comparison to the previous 97% on the same database using 3D ResNets.
arXiv Detail & Related papers (2021-02-02T09:47:31Z) - Synthetic Expressions are Better Than Real for Learning to Detect Facial
Actions [4.4532095214807965]
Our approach reconstructs the 3D shape of the face from each video frame, aligns the 3D mesh to a canonical view, and then trains a GAN-based network to synthesize novel images with facial action units of interest.
The network trained on synthesized facial expressions outperformed the one trained on actual facial expressions and surpassed current state-of-the-art approaches.
arXiv Detail & Related papers (2020-10-21T13:11:45Z) - Multi-channel Deep 3D Face Recognition [4.726009758066045]
The accuracy of 2D face recognition is still challenged by the change of pose, illumination, make-up, and expression.
We propose a multi-Channel deep 3D face network for face recognition based on 3D face data.
The face recognition accuracy of the multi-Channel deep 3D face network has achieved 98.6.
arXiv Detail & Related papers (2020-09-30T15:29:05Z) - Making a Case for 3D Convolutions for Object Segmentation in Videos [16.167397418720483]
We show that 3D convolutional networks can be effectively applied to dense video prediction tasks such as salient object segmentation.
We propose a 3D decoder architecture, that comprises novel 3D Global Convolution layers and 3D Refinement modules.
Our approach outperforms existing state-of-the-arts by a large margin on the DAVIS'16 Unsupervised, FBMS and ViSal benchmarks.
arXiv Detail & Related papers (2020-08-26T12:24:23Z) - Attribute-aware Identity-hard Triplet Loss for Video-based Person
Re-identification [51.110453988705395]
Video-based person re-identification (Re-ID) is an important computer vision task.
We introduce a new metric learning method called Attribute-aware Identity-hard Triplet Loss (AITL)
To achieve a complete model of video-based person Re-ID, a multi-task framework with Attribute-driven Spatio-Temporal Attention (ASTA) mechanism is also proposed.
arXiv Detail & Related papers (2020-06-13T09:15:38Z) - DeepFaceFlow: In-the-wild Dense 3D Facial Motion Estimation [56.56575063461169]
DeepFaceFlow is a robust, fast, and highly-accurate framework for the estimation of 3D non-rigid facial flow.
Our framework was trained and tested on two very large-scale facial video datasets.
Given registered pairs of images, our framework generates 3D flow maps at 60 fps.
arXiv Detail & Related papers (2020-05-14T23:56:48Z) - CAKES: Channel-wise Automatic KErnel Shrinking for Efficient 3D Networks [87.02416370081123]
3D Convolution Neural Networks (CNNs) have been widely applied to 3D scene understanding, such as video analysis and volumetric image recognition.
We propose Channel-wise Automatic KErnel Shrinking (CAKES), to enable efficient 3D learning by shrinking standard 3D convolutions into a set of economic operations.
arXiv Detail & Related papers (2020-03-28T14:21:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.