Multimodal Generation of Novel Action Appearances for Synthetic-to-Real
Recognition of Activities of Daily Living
- URL: http://arxiv.org/abs/2208.01910v1
- Date: Wed, 3 Aug 2022 08:28:33 GMT
- Title: Multimodal Generation of Novel Action Appearances for Synthetic-to-Real
Recognition of Activities of Daily Living
- Authors: Zdravko Marinov, David Schneider, Alina Roitberg, Rainer Stiefelhagen
- Abstract summary: Domain shifts, such as appearance changes, are a key challenge in real-world applications of activity recognition models.
We introduce an activity domain generation framework which creates novel ADL appearances from different existing activity modalities.
Our framework computes human poses, heatmaps of body joints, and optical flow maps and uses them alongside the original RGB videos to learn the essence of source domains.
- Score: 25.04517296731092
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Domain shifts, such as appearance changes, are a key challenge in real-world
applications of activity recognition models, which range from assistive
robotics and smart homes to driver observation in intelligent vehicles. For
example, while simulations are an excellent way of economical data collection,
a Synthetic-to-Real domain shift leads to a > 60% drop in accuracy when
recognizing activities of Daily Living (ADLs). We tackle this challenge and
introduce an activity domain generation framework which creates novel ADL
appearances (novel domains) from different existing activity modalities (source
domains) inferred from video training data. Our framework computes human poses,
heatmaps of body joints, and optical flow maps and uses them alongside the
original RGB videos to learn the essence of source domains in order to generate
completely new ADL domains. The model is optimized by maximizing the distance
between the existing source appearances and the generated novel appearances
while ensuring that the semantics of an activity is preserved through an
additional classification loss. While source data multimodality is an important
concept in this design, our setup does not rely on multi-sensor setups, (i.e.,
all source modalities are inferred from a single video only.) The newly created
activity domains are then integrated in the training of the ADL classification
networks, resulting in models far less susceptible to changes in data
distributions. Extensive experiments on the Synthetic-to-Real benchmark
Sims4Action demonstrate the potential of the domain generation paradigm for
cross-domain ADL recognition, setting new state-of-the-art results. Our code is
publicly available at https://github.com/Zrrr1997/syn2real_DG
Related papers
- Learn from the Learnt: Source-Free Active Domain Adaptation via Contrastive Sampling and Visual Persistence [60.37934652213881]
Domain Adaptation (DA) facilitates knowledge transfer from a source domain to a related target domain.
This paper investigates a practical DA paradigm, namely Source data-Free Active Domain Adaptation (SFADA), where source data becomes inaccessible during adaptation.
We present learn from the learnt (LFTL), a novel paradigm for SFADA to leverage the learnt knowledge from the source pretrained model and actively iterated models without extra overhead.
arXiv Detail & Related papers (2024-07-26T17:51:58Z) - Revisiting the Domain Shift and Sample Uncertainty in Multi-source
Active Domain Transfer [69.82229895838577]
Active Domain Adaptation (ADA) aims to maximally boost model adaptation in a new target domain by actively selecting a limited number of target data to annotate.
This setting neglects the more practical scenario where training data are collected from multiple sources.
This motivates us to target a new and challenging setting of knowledge transfer that extends ADA from a single source domain to multiple source domains.
arXiv Detail & Related papers (2023-11-21T13:12:21Z) - SPADES: A Realistic Spacecraft Pose Estimation Dataset using Event
Sensing [9.583223655096077]
Due to limited access to real target datasets, algorithms are often trained using synthetic data and applied in the real domain.
Event sensing has been explored in the past and shown to reduce the domain gap between simulations and real-world scenarios.
We introduce a novel dataset, SPADES, comprising real event data acquired in a controlled laboratory environment and simulated event data using the same camera intrinsics.
arXiv Detail & Related papers (2023-11-09T12:14:47Z) - Transferring Foundation Models for Generalizable Robotic Manipulation [82.12754319808197]
We propose a novel paradigm that effectively leverages language-reasoning segmentation mask generated by internet-scale foundation models.
Our approach can effectively and robustly perceive object pose and enable sample-efficient generalization learning.
Demos can be found in our submitted video, and more comprehensive ones can be found in link1 or link2.
arXiv Detail & Related papers (2023-06-09T07:22:12Z) - Domain-Adaptive Full-Face Gaze Estimation via Novel-View-Synthesis and Feature Disentanglement [12.857137513211866]
We propose an effective model training pipeline consisting of a training data synthesis and a gaze estimation model for unsupervised domain adaptation.
The proposed data synthesis leverages the single-image 3D reconstruction to expand the range of the head poses from the source domain without requiring a 3D facial shape dataset.
We propose a disentangling autoencoder network to separate gaze-related features and introduce background augmentation consistency loss to utilize the characteristics of the synthetic source domain.
arXiv Detail & Related papers (2023-05-25T15:15:03Z) - Exploring Few-Shot Adaptation for Activity Recognition on Diverse Domains [46.26074225989355]
Domain adaptation is essential for activity recognition to ensure accurate and robust performance across diverse environments.
In this work, we focus on FewShot Domain Adaptation for Activity Recognition (FSDA-AR), which leverages a very small amount of labeled target videos.
We propose a new FSDA-AR using five established datasets considering the adaptation on more diverse and challenging domains.
arXiv Detail & Related papers (2023-05-15T08:01:05Z) - Synthetic-to-Real Domain Adaptation for Action Recognition: A Dataset and Baseline Performances [76.34037366117234]
We introduce a new dataset called Robot Control Gestures (RoCoG-v2)
The dataset is composed of both real and synthetic videos from seven gesture classes.
We present results using state-of-the-art action recognition and domain adaptation algorithms.
arXiv Detail & Related papers (2023-03-17T23:23:55Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - Towards Optimal Strategies for Training Self-Driving Perception Models
in Simulation [98.51313127382937]
We focus on the use of labels in the synthetic domain alone.
Our approach introduces both a way to learn neural-invariant representations and a theoretically inspired view on how to sample the data from the simulator.
We showcase our approach on the bird's-eye-view vehicle segmentation task with multi-sensor data.
arXiv Detail & Related papers (2021-11-15T18:37:43Z) - Towards Adaptive Semantic Segmentation by Progressive Feature Refinement [16.40758125170239]
We propose an innovative progressive feature refinement framework, along with domain adversarial learning to boost the transferability of segmentation networks.
As a result, the segmentation models trained with source domain images can be transferred to a target domain without significant performance degradation.
arXiv Detail & Related papers (2020-09-30T04:17:48Z) - Style-transfer GANs for bridging the domain gap in synthetic pose
estimator training [8.508403388002133]
We propose to adopt general-purpose GAN models for pixel-level image translation.
The obtained models are then used either during training or inference to bridge the domain gap.
Our evaluation shows a considerable improvement in model performance when compared to a model trained with the same degree of domain randomization.
arXiv Detail & Related papers (2020-04-28T17:35:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.