Dual-Triplet Metric Learning for Unsupervised Domain Adaptation in
Video-Based Face Recognition
- URL: http://arxiv.org/abs/2002.04206v1
- Date: Tue, 11 Feb 2020 05:06:30 GMT
- Title: Dual-Triplet Metric Learning for Unsupervised Domain Adaptation in
Video-Based Face Recognition
- Authors: George Ekladious, Hugo Lemoine, Eric Granger, Kaveh Kamali, Salim
Moudache
- Abstract summary: A new deep domain adaptation (DA) method is proposed to adapt the CNN embedding of a Siamese network using unlabeled tracklets captured with a new video cameras.
The proposed metric learning technique is used to train deep Siamese networks under different training scenarios.
- Score: 8.220945563455848
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The scalability and complexity of deep learning models remains a key issue in
many of visual recognition applications like, e.g., video surveillance, where
fine tuning with labeled image data from each new camera is required to reduce
the domain shift between videos captured from the source domain, e.g., a
laboratory setting, and the target domain, i.e, an operational environment. In
many video surveillance applications, like face recognition (FR) and person
re-identification, a pair-wise matcher is used to assign a query image captured
using a video camera to the corresponding reference images in a gallery. The
different configurations and operational conditions of video cameras can
introduce significant shifts in the pair-wise distance distributions, resulting
in degraded recognition performance for new cameras. In this paper, a new deep
domain adaptation (DA) method is proposed to adapt the CNN embedding of a
Siamese network using unlabeled tracklets captured with a new video cameras. To
this end, a dual-triplet loss is introduced for metric learning, where two
triplets are constructed using video data from a source camera, and a new
target camera. In order to constitute the dual triplets, a mutual-supervised
learning approach is introduced where the source camera acts as a teacher,
providing the target camera with an initial embedding. Then, the student relies
on the teacher to iteratively label the positive and negative pairs collected
during, e.g., initial camera calibration. Both source and target embeddings
continue to simultaneously learn such that their pair-wise distance
distributions become aligned. For validation, the proposed metric learning
technique is used to train deep Siamese networks under different training
scenarios, and is compared to state-of-the-art techniques for still-to-video FR
on the COX-S2V and a private video-based FR dataset.
Related papers
- Neuromorphic Synergy for Video Binarization [54.195375576583864]
Bimodal objects serve as a visual form to embed information that can be easily recognized by vision systems.
Neuromorphic cameras offer new capabilities for alleviating motion blur, but it is non-trivial to first de-blur and then binarize the images in a real-time manner.
We propose an event-based binary reconstruction method that leverages the prior knowledge of the bimodal target's properties to perform inference independently in both event space and image space.
We also develop an efficient integration method to propagate this binary image to high frame rate binary video.
arXiv Detail & Related papers (2024-02-20T01:43:51Z) - Video alignment using unsupervised learning of local and global features [0.0]
We introduce an unsupervised method for alignment that uses global and local features of the frames.
In particular, we introduce effective features for each video frame by means of three machine vision tools: person detection, pose estimation, and VGG network.
The main advantage of our approach is that no training is required, which makes it applicable for any new type of action without any need to collect training samples for it.
arXiv Detail & Related papers (2023-04-13T22:20:54Z) - Multi-task Learning for Camera Calibration [3.274290296343038]
We present a unique method for predicting intrinsic (principal point offset and focal length) and extrinsic (baseline, pitch, and translation) properties from a pair of images.
By reconstructing the 3D points using a camera model neural network and then using the loss in reconstruction to obtain the camera specifications, this innovative camera projection loss (CPL) method allows us that the desired parameters should be estimated.
arXiv Detail & Related papers (2022-11-22T17:39:31Z) - Camera Alignment and Weighted Contrastive Learning for Domain Adaptation
in Video Person ReID [17.90248359024435]
Systems for person re-identification (ReID) can achieve a high accuracy when trained on large fully-labeled image datasets.
The domain shift associated with diverse operational capture conditions (e.g., camera viewpoints and lighting) may translate to a significant decline in performance.
This paper focuses on unsupervised domain adaptation (UDA) for video-based ReID.
arXiv Detail & Related papers (2022-11-07T15:32:56Z) - Dual Adversarial Adaptation for Cross-Device Real-World Image
Super-Resolution [114.26933742226115]
Super-resolution (SR) models trained on images from different devices could exhibit distinct imaging patterns.
We propose an unsupervised domain adaptation mechanism for real-world SR, named Dual ADversarial Adaptation (DADA)
We empirically conduct experiments under six Real to Real adaptation settings among three different cameras, and achieve superior performance compared with existing state-of-the-art approaches.
arXiv Detail & Related papers (2022-05-07T02:55:39Z) - CycDA: Unsupervised Cycle Domain Adaptation from Image to Video [26.30914383638721]
Domain Cycle Adaptation (CycDA) is a cycle-based approach for unsupervised image-to-video domain adaptation.
We evaluate our approach on benchmark datasets for image-to-video and for mixed-source domain adaptation.
arXiv Detail & Related papers (2022-03-30T12:22:26Z) - Unsupervised Simultaneous Learning for Camera Re-Localization and Depth
Estimation from Video [4.5307040147072275]
We present an unsupervised simultaneous learning framework for the task of monocular camera re-localization and depth estimation from unlabeled video sequences.
In our framework, we train two networks that estimate the scene coordinates using directions and the depth map from each image which are then combined to estimate the camera pose.
Our method also outperforms state-of-the-art monocular depth estimation in a trained environment.
arXiv Detail & Related papers (2022-03-24T02:11:03Z) - Unsupervised Domain Adaptation for Video Semantic Segmentation [91.30558794056054]
Unsupervised Domain Adaptation for semantic segmentation has gained immense popularity since it can transfer knowledge from simulation to real.
In this work, we present a new video extension of this task, namely Unsupervised Domain Adaptation for Video Semantic approaches.
We show that our proposals significantly outperform previous image-based UDA methods both on image-level (mIoU) and video-level (VPQ) evaluation metrics.
arXiv Detail & Related papers (2021-07-23T07:18:20Z) - Learning Dynamic Alignment via Meta-filter for Few-shot Learning [94.41887992982986]
Few-shot learning aims to recognise new classes by adapting the learned knowledge with extremely limited few-shot (support) examples.
We learn a dynamic alignment, which can effectively highlight both query regions and channels according to different local support information.
The resulting framework establishes the new state-of-the-arts on major few-shot visual recognition benchmarks.
arXiv Detail & Related papers (2021-03-25T03:29:33Z) - Adversarial Bipartite Graph Learning for Video Domain Adaptation [50.68420708387015]
Domain adaptation techniques, which focus on adapting models between distributionally different domains, are rarely explored in the video recognition area.
Recent works on visual domain adaptation which leverage adversarial learning to unify the source and target video representations are not highly effective on the videos.
This paper proposes an Adversarial Bipartite Graph (ABG) learning framework which directly models the source-target interactions.
arXiv Detail & Related papers (2020-07-31T03:48:41Z) - Unsupervised Learning of Video Representations via Dense Trajectory
Clustering [86.45054867170795]
This paper addresses the task of unsupervised learning of representations for action recognition in videos.
We first propose to adapt two top performing objectives in this class - instance recognition and local aggregation.
We observe promising performance, but qualitative analysis shows that the learned representations fail to capture motion patterns.
arXiv Detail & Related papers (2020-06-28T22:23:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.