Red Carpet to Fight Club: Partially-supervised Domain Transfer for Face
Recognition in Violent Videos
- URL: http://arxiv.org/abs/2009.07576v1
- Date: Wed, 16 Sep 2020 09:45:33 GMT
- Title: Red Carpet to Fight Club: Partially-supervised Domain Transfer for Face
Recognition in Violent Videos
- Authors: Yunus Can Bilge, Mehmet Kerim Yucel, Ramazan Gokberk Cinbis, Nazli
Ikizler-Cinbis, Pinar Duygulu
- Abstract summary: We introduce the WildestFaces dataset to study cross-domain recognition under a variety of adverse conditions.
We establish a rigorous evaluation protocol for this clean-to-violent recognition task, and present a detailed analysis of the proposed dataset and the methods.
- Score: 12.534785814117065
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In many real-world problems, there is typically a large discrepancy between
the characteristics of data used in training versus deployment. A prime example
is the analysis of aggression videos: in a criminal incidence, typically
suspects need to be identified based on their clean portrait-like photos,
instead of their prior video recordings. This results in three major
challenges; large domain discrepancy between violence videos and ID-photos, the
lack of video examples for most individuals and limited training data
availability. To mimic such scenarios, we formulate a realistic domain-transfer
problem, where the goal is to transfer the recognition model trained on clean
posed images to the target domain of violent videos, where training videos are
available only for a subset of subjects. To this end, we introduce the
WildestFaces dataset, tailored to study cross-domain recognition under a
variety of adverse conditions. We divide the task of transferring a recognition
model from the domain of clean images to the violent videos into two
sub-problems and tackle them using (i) stacked affine-transforms for
classifier-transfer, (ii) attention-driven pooling for temporal-adaptation. We
additionally formulate a self-attention based model for domain-transfer. We
establish a rigorous evaluation protocol for this clean-to-violent recognition
task, and present a detailed analysis of the proposed dataset and the methods.
Our experiments highlight the unique challenges introduced by the WildestFaces
dataset and the advantages of the proposed approach.
Related papers
- Adversarial Attacks on Video Object Segmentation with Hard Region
Discovery [31.882369005280793]
Video object segmentation has been applied to various computer vision tasks, such as video editing, autonomous driving, and human-robot interaction.
Deep neural networks are vulnerable to adversarial examples, which are the inputs attacked by almost human-imperceptible perturbations.
This will rise the security issues in highly-demanding tasks because small perturbations to the input video will result in potential attack risks.
arXiv Detail & Related papers (2023-09-25T03:52:15Z) - CDFSL-V: Cross-Domain Few-Shot Learning for Videos [58.37446811360741]
Few-shot video action recognition is an effective approach to recognizing new categories with only a few labeled examples.
Existing methods in video action recognition rely on large labeled datasets from the same domain.
We propose a novel cross-domain few-shot video action recognition method that leverages self-supervised learning and curriculum learning.
arXiv Detail & Related papers (2023-09-07T19:44:27Z) - Camera Alignment and Weighted Contrastive Learning for Domain Adaptation
in Video Person ReID [17.90248359024435]
Systems for person re-identification (ReID) can achieve a high accuracy when trained on large fully-labeled image datasets.
The domain shift associated with diverse operational capture conditions (e.g., camera viewpoints and lighting) may translate to a significant decline in performance.
This paper focuses on unsupervised domain adaptation (UDA) for video-based ReID.
arXiv Detail & Related papers (2022-11-07T15:32:56Z) - Unsupervised Video Domain Adaptation for Action Recognition: A
Disentanglement Perspective [37.45565756522847]
We consider the generation of cross-domain videos from two sets of latent factors.
TranSVAE framework is then developed to model such generation.
Experiments on the UCF-HMDB, Jester, and Epic-Kitchens datasets verify the effectiveness and superiority of TranSVAE.
arXiv Detail & Related papers (2022-08-15T17:59:31Z) - Leveraging Real Talking Faces via Self-Supervision for Robust Forgery
Detection [112.96004727646115]
We develop a method to detect face-manipulated videos using real talking faces.
We show that our method achieves state-of-the-art performance on cross-manipulation generalisation and robustness experiments.
Our results suggest that leveraging natural and unlabelled videos is a promising direction for the development of more robust face forgery detectors.
arXiv Detail & Related papers (2022-01-18T17:14:54Z) - Unsupervised Domain Adaptation for Video Semantic Segmentation [91.30558794056054]
Unsupervised Domain Adaptation for semantic segmentation has gained immense popularity since it can transfer knowledge from simulation to real.
In this work, we present a new video extension of this task, namely Unsupervised Domain Adaptation for Video Semantic approaches.
We show that our proposals significantly outperform previous image-based UDA methods both on image-level (mIoU) and video-level (VPQ) evaluation metrics.
arXiv Detail & Related papers (2021-07-23T07:18:20Z) - JOKR: Joint Keypoint Representation for Unsupervised Cross-Domain Motion
Retargeting [53.28477676794658]
unsupervised motion in videos has seen substantial advancements through the use of deep neural networks.
We introduce JOKR - a JOint Keypoint Representation that handles both the source and target videos, without requiring any object prior or data collection.
We evaluate our method both qualitatively and quantitatively, and demonstrate that our method handles various cross-domain scenarios, such as different animals, different flowers, and humans.
arXiv Detail & Related papers (2021-06-17T17:32:32Z) - Adversarial Bipartite Graph Learning for Video Domain Adaptation [50.68420708387015]
Domain adaptation techniques, which focus on adapting models between distributionally different domains, are rarely explored in the video recognition area.
Recent works on visual domain adaptation which leverage adversarial learning to unify the source and target video representations are not highly effective on the videos.
This paper proposes an Adversarial Bipartite Graph (ABG) learning framework which directly models the source-target interactions.
arXiv Detail & Related papers (2020-07-31T03:48:41Z) - Human Motion Transfer from Poses in the Wild [61.6016458288803]
We tackle the problem of human motion transfer, where we synthesize novel motion video for a target person that imitates the movement from a reference video.
It is a video-to-video translation task in which the estimated poses are used to bridge two domains.
We introduce a novel pose-to-video translation framework for generating high-quality videos that are temporally coherent even for in-the-wild pose sequences unseen during training.
arXiv Detail & Related papers (2020-04-07T05:59:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.