Team VI-I2R Technical Report on EPIC-KITCHENS-100 Unsupervised Domain
Adaptation Challenge for Action Recognition 2021
- URL: http://arxiv.org/abs/2206.02573v1
- Date: Fri, 3 Jun 2022 07:37:48 GMT
- Title: Team VI-I2R Technical Report on EPIC-KITCHENS-100 Unsupervised Domain
Adaptation Challenge for Action Recognition 2021
- Authors: Yi Cheng, Fen Fang, Ying Sun
- Abstract summary: The EPIC-KITCHENS-100 dataset consists of daily kitchen activities focusing on the interaction between human hands and their surrounding objects.
It is very challenging to accurately recognize these fine-grained activities, due to the presence of distracting objects and visually similar action classes.
We propose to learn hand-centric features by leveraging the hand bounding box information for UDA on fine-grained action recognition.
Our submission achieved the 1st place in terms of top-1 action recognition accuracy, using only RGB and optical flow modalities as input.
- Score: 6.614021153407064
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this report, we present the technical details of our approach to the
EPIC-KITCHENS-100 Unsupervised Domain Adaptation (UDA) Challenge for Action
Recognition. The EPIC-KITCHENS-100 dataset consists of daily kitchen activities
focusing on the interaction between human hands and their surrounding objects.
It is very challenging to accurately recognize these fine-grained activities,
due to the presence of distracting objects and visually similar action classes,
especially in the unlabelled target domain. Based on an existing method for
video domain adaptation, i.e., TA3N, we propose to learn hand-centric features
by leveraging the hand bounding box information for UDA on fine-grained action
recognition. This helps reduce the distraction from background as well as
facilitate the learning of domain-invariant features. To achieve high quality
hand localization, we adopt an uncertainty-aware domain adaptation network,
i.e., MEAA, to train a domain-adaptive hand detector, which only uses very
limited hand bounding box annotations in the source domain but can generalize
well to the unlabelled target domain. Our submission achieved the 1st place in
terms of top-1 action recognition accuracy, using only RGB and optical flow
modalities as input.
Related papers
- GrabDAE: An Innovative Framework for Unsupervised Domain Adaptation Utilizing Grab-Mask and Denoise Auto-Encoder [16.244871317281614]
Unsupervised Domain Adaptation (UDA) aims to adapt a model trained on a labeled source domain to an unlabeled target domain by addressing the domain shift.
We introduce GrabDAE, an innovative UDA framework designed to tackle domain shift in visual classification tasks.
arXiv Detail & Related papers (2024-10-10T15:19:57Z) - Team VI-I2R Technical Report on EPIC-KITCHENS-100 Unsupervised Domain
Adaptation Challenge for Action Recognition 2022 [6.561596502471905]
We present our submission to the EPIC-KITCHENS-100 Unsupervised Domain Adaptation Challenge for Action Recognition 2022.
This task aims to adapt an action recognition model trained on a labeled source domain to an unlabeled target domain.
Our final submission achieved the first place in terms of top-1 action recognition accuracy.
arXiv Detail & Related papers (2023-01-29T12:29:24Z) - Adversarial Domain Adaptation for Action Recognition Around the Clock [0.7614628596146599]
This paper presents a domain adaptation-based action recognition approach.
It uses adversarial learning in cross-domain settings to learn cross-domain action recognition.
It achieves SOTA performance on InFAR and XD145 actions datasets.
arXiv Detail & Related papers (2022-10-25T01:08:27Z) - Domain-Agnostic Prior for Transfer Semantic Segmentation [197.9378107222422]
Unsupervised domain adaptation (UDA) is an important topic in the computer vision community.
We present a mechanism that regularizes cross-domain representation learning with a domain-agnostic prior (DAP)
Our research reveals that UDA benefits much from better proxies, possibly from other data modalities.
arXiv Detail & Related papers (2022-04-06T09:13:25Z) - Audio-Adaptive Activity Recognition Across Video Domains [112.46638682143065]
We leverage activity sounds for domain adaptation as they have less variance across domains and can reliably indicate which activities are not happening.
We propose an audio-adaptive encoder and associated learning methods that discriminatively adjust the visual feature representation.
We also introduce the new task of actor shift, with a corresponding audio-visual dataset, to challenge our method with situations where the activity appearance changes dramatically.
arXiv Detail & Related papers (2022-03-27T08:15:20Z) - Decompose to Adapt: Cross-domain Object Detection via Feature
Disentanglement [79.2994130944482]
We design a Domain Disentanglement Faster-RCNN (DDF) to eliminate the source-specific information in the features for detection task learning.
Our DDF method facilitates the feature disentanglement at the global and local stages, with a Global Triplet Disentanglement (GTD) module and an Instance Similarity Disentanglement (ISD) module.
By outperforming state-of-the-art methods on four benchmark UDA object detection tasks, our DDF method is demonstrated to be effective with wide applicability.
arXiv Detail & Related papers (2022-01-06T05:43:01Z) - Stagewise Unsupervised Domain Adaptation with Adversarial Self-Training
for Road Segmentation of Remote Sensing Images [93.50240389540252]
Road segmentation from remote sensing images is a challenging task with wide ranges of application potentials.
We propose a novel stagewise domain adaptation model called RoadDA to address the domain shift (DS) issue in this field.
Experiment results on two benchmarks demonstrate that RoadDA can efficiently reduce the domain gap and outperforms state-of-the-art methods.
arXiv Detail & Related papers (2021-08-28T09:29:14Z) - Learning Cross-modal Contrastive Features for Video Domain Adaptation [138.75196499580804]
We propose a unified framework for video domain adaptation, which simultaneously regularizes cross-modal and cross-domain feature representations.
Specifically, we treat each modality in a domain as a view and leverage the contrastive learning technique with properly designed sampling strategies.
arXiv Detail & Related papers (2021-08-26T18:14:18Z) - PoliTO-IIT Submission to the EPIC-KITCHENS-100 Unsupervised Domain
Adaptation Challenge for Action Recognition [15.545769463854915]
This report describes our submission to the EPIC-Kitchens-100 Unsupervised Domain Adaptation (UDA) Challenge in Action Recognition.
We first exploited a recent Domain Generalization (DG) technique, called Relative Norm Alignment (RNA)
In a second phase, we extended the approach to work on unlabelled target data, allowing the model to adapt to the target distribution in an unsupervised fashion.
Our submission (entry 'plnet') is visible on the leaderboard and it achieved the 1st position for'verb', and the 3rd position for both 'noun' and 'action'
arXiv Detail & Related papers (2021-07-01T10:02:44Z) - Surprisingly Simple Semi-Supervised Domain Adaptation with Pretraining
and Consistency [93.89773386634717]
Visual domain adaptation involves learning to classify images from a target visual domain using labels available in a different source domain.
We show that in the presence of a few target labels, simple techniques like self-supervision (via rotation prediction) and consistency regularization can be effective without any adversarial alignment to learn a good target classifier.
Our Pretraining and Consistency (PAC) approach, can achieve state of the art accuracy on this semi-supervised domain adaptation task, surpassing multiple adversarial domain alignment methods, across multiple datasets.
arXiv Detail & Related papers (2021-01-29T18:40:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.