Domain Adaptive Hand Keypoint and Pixel Localization in the Wild
- URL: http://arxiv.org/abs/2203.08344v2
- Date: Fri, 18 Mar 2022 13:35:04 GMT
- Title: Domain Adaptive Hand Keypoint and Pixel Localization in the Wild
- Authors: Takehiko Ohkawa, Yu-Jhe Li, Qichen Fu, Ryosuke Furuta, Kris M. Kitani
and Yoichi Sato
- Abstract summary: We aim to improve the performance of regressing hand keypoints and segmenting pixel-level hand masks under new imaging conditions.
Our method improves by 4% the multi-task score on HO3D compared to the latest adversarial adaptation method.
- Score: 40.71379707068579
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We aim to improve the performance of regressing hand keypoints and segmenting
pixel-level hand masks under new imaging conditions (e.g., outdoors) when we
only have labeled images taken under very different conditions (e.g., indoors).
In the real world, it is important that the model trained for both tasks works
under various imaging conditions. However, their variation covered by existing
labeled hand datasets is limited. Thus, it is necessary to adapt the model
trained on the labeled images (source) to unlabeled images (target) with unseen
imaging conditions. While self-training domain adaptation methods (i.e.,
learning from the unlabeled target images in a self-supervised manner) have
been developed for both tasks, their training may degrade performance when the
predictions on the target images are noisy. To avoid this, it is crucial to
assign a low importance (confidence) weight to the noisy predictions during
self-training. In this paper, we propose to utilize the divergence of two
predictions to estimate the confidence of the target image for both tasks.
These predictions are given from two separate networks, and their divergence
helps identify the noisy predictions. To integrate our proposed confidence
estimation into self-training, we propose a teacher-student framework where the
two networks (teachers) provide supervision to a network (student) for
self-training, and the teachers are learned from the student by knowledge
distillation. Our experiments show its superiority over state-of-the-art
methods in adaptation settings with different lighting, grasping objects,
backgrounds, and camera viewpoints. Our method improves by 4% the multi-task
score on HO3D compared to the latest adversarial adaptation method. We also
validate our method on Ego4D, egocentric videos with rapid changes in imaging
conditions outdoors.
Related papers
- One Diffusion to Generate Them All [54.82732533013014]
OneDiffusion is a versatile, large-scale diffusion model that supports bidirectional image synthesis and understanding.
It enables conditional generation from inputs such as text, depth, pose, layout, and semantic maps.
OneDiffusion allows for multi-view generation, camera pose estimation, and instant personalization using sequential image inputs.
arXiv Detail & Related papers (2024-11-25T12:11:05Z) - Intra-task Mutual Attention based Vision Transformer for Few-Shot Learning [12.5354658533836]
Humans possess remarkable ability to accurately classify new, unseen images after being exposed to only a few examples.
For artificial neural network models, determining the most relevant features for distinguishing between two images with limited samples presents a challenge.
We propose an intra-task mutual attention method for few-shot learning, that involves splitting the support and query samples into patches.
arXiv Detail & Related papers (2024-05-06T02:02:57Z) - Improving Human-Object Interaction Detection via Virtual Image Learning [68.56682347374422]
Human-Object Interaction (HOI) detection aims to understand the interactions between humans and objects.
In this paper, we propose to alleviate the impact of such an unbalanced distribution via Virtual Image Leaning (VIL)
A novel label-to-image approach, Multiple Steps Image Creation (MUSIC), is proposed to create a high-quality dataset that has a consistent distribution with real images.
arXiv Detail & Related papers (2023-08-04T10:28:48Z) - MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments [72.6405488990753]
Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks.
We propose a single-stage and standalone method, MOCA, which unifies both desired properties.
We achieve new state-of-the-art results on low-shot settings and strong experimental results in various evaluation protocols.
arXiv Detail & Related papers (2023-07-18T15:46:20Z) - Mix-up Self-Supervised Learning for Contrast-agnostic Applications [33.807005669824136]
We present the first mix-up self-supervised learning framework for contrast-agnostic applications.
We address the low variance across images based on cross-domain mix-up and build the pretext task based on image reconstruction and transparency prediction.
arXiv Detail & Related papers (2022-04-02T16:58:36Z) - Continual Active Learning for Efficient Adaptation of Machine Learning
Models to Changing Image Acquisition [3.205205037629335]
We propose a method for continual active learning on a data stream of medical images.
It recognizes shifts or additions of new imaging sources - domains - and adapts training accordingly.
Results demonstrate that the proposed method outperforms naive active learning while requiring less manual labelling.
arXiv Detail & Related papers (2021-06-07T05:39:06Z) - An Empirical Study of the Collapsing Problem in Semi-Supervised 2D Human
Pose Estimation [80.02124918255059]
Semi-supervised learning aims to boost the accuracy of a model by exploring unlabeled images.
We learn two networks to mutually teach each other.
The more reliable predictions on easy images in each network are used to teach the other network to learn about the corresponding hard images.
arXiv Detail & Related papers (2020-11-25T03:29:52Z) - Adversarial Transfer of Pose Estimation Regression [11.117357750374035]
We develop a deep adaptation network for learning scene-invariant image representations and use adversarial learning to generate representations for model transfer.
We evaluate our network on two public datasets, Cambridge Landmarks and 7Scene, demonstrate its superiority over several baselines and compare to the state of the art methods.
arXiv Detail & Related papers (2020-06-20T21:16:37Z) - Self-Supervised Linear Motion Deblurring [112.75317069916579]
Deep convolutional neural networks are state-of-the-art for image deblurring.
We present a differentiable reblur model for self-supervised motion deblurring.
Our experiments demonstrate that self-supervised single image deblurring is really feasible.
arXiv Detail & Related papers (2020-02-10T20:15:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.