Related papers: Feature-Supervised Action Modality Transfer

Feature-Supervised Action Modality Transfer

URL: http://arxiv.org/abs/2108.03329v1
Date: Fri, 6 Aug 2021 22:59:10 GMT
Title: Feature-Supervised Action Modality Transfer
Authors: Fida Mohammad Thoker, Cees G. M. Snoek
Abstract summary: This paper strives for action recognition and detection in video modalities when only limited modality-specific labeled examples are available. For the RGB, and derived optical-flow, modality many large-scale labeled datasets have been made available.
Score: 35.550525307238146
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper strives for action recognition and detection in video modalities like RGB, depth maps or 3D-skeleton sequences when only limited modality-specific labeled examples are available. For the RGB, and derived optical-flow, modality many large-scale labeled datasets have been made available. They have become the de facto pre-training choice when recognizing or detecting new actions from RGB datasets that have limited amounts of labeled examples available. Unfortunately, large-scale labeled action datasets for other modalities are unavailable for pre-training. In this paper, our goal is to recognize actions from limited examples in non-RGB video modalities, by learning from large-scale labeled RGB data. To this end, we propose a two-step training process: (i) we extract action representation knowledge from an RGB-trained teacher network and adapt it to a non-RGB student network. (ii) we then fine-tune the transfer model with available labeled examples of the target modality. For the knowledge transfer we introduce feature-supervision strategies, which rely on unlabeled pairs of two modalities (the RGB and the target modality) to transfer feature level representations from the teacher to the student network. Ablations and generalizations with two RGB source datasets and two non-RGB target datasets demonstrate that an optical-flow teacher provides better action transfer features than RGB for both depth maps and 3D-skeletons, even when evaluated on a different target domain, or for a different task. Compared to alternative cross-modal action transfer methods we show a good improvement in performance especially when labeled non-RGB examples to learn from are scarce

Related papers

TENet: Targetness Entanglement Incorporating with Multi-Scale Pooling and Mutually-Guided Fusion for RGB-E Object Tracking [30.89375068036783]
Existing approaches perform event feature extraction for RGB-E tracking using traditional appearance models. We propose an Event backbone (Pooler) to obtain a high-quality feature representation that is cognisant of the intrinsic characteristics of the event data. Our method significantly outperforms state-of-the-art trackers on two widely used RGB-E tracking datasets.
arXiv Detail & Related papers (2024-05-08T12:19:08Z)
DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation [76.81628995237058]
DFormer is a novel framework to learn transferable representations for RGB-D segmentation tasks. It pretrains the backbone using image-depth pairs from ImageNet-1K. DFormer achieves new state-of-the-art performance on two popular RGB-D tasks.
arXiv Detail & Related papers (2023-09-18T11:09:11Z)
PointMBF: A Multi-scale Bidirectional Fusion Network for Unsupervised RGB-D Point Cloud Registration [6.030097207369754]
We propose a network implementing multi-scale bidirectional fusion between RGB images and point clouds generated from depth images. Our method achieves new state-of-the-art performance.
arXiv Detail & Related papers (2023-08-09T08:13:46Z)
RGB-D Saliency Detection via Cascaded Mutual Information Minimization [122.8879596830581]
Existing RGB-D saliency detection models do not explicitly encourage RGB and depth to achieve effective multi-modal learning. We introduce a novel multi-stage cascaded learning framework via mutual information minimization to "explicitly" model the multi-modal information between RGB image and depth data.
arXiv Detail & Related papers (2021-09-15T12:31:27Z)
Cross-modality Discrepant Interaction Network for RGB-D Salient Object Detection [78.47767202232298]
We propose a novel Cross-modality Discrepant Interaction Network (CDINet) for RGB-D SOD. Two components are designed to implement the effective cross-modality interaction. Our network outperforms $15$ state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2021-08-04T11:24:42Z)
Self-Supervised Representation Learning for RGB-D Salient Object Detection [93.17479956795862]
We use Self-Supervised Representation Learning to design two pretext tasks: the cross-modal auto-encoder and the depth-contour estimation. Our pretext tasks require only a few and un RGB-D datasets to perform pre-training, which make the network capture rich semantic contexts. For the inherent problem of cross-modal fusion in RGB-D SOD, we propose a multi-path fusion module.
arXiv Detail & Related papers (2021-01-29T09:16:06Z)
P4Contrast: Contrastive Learning with Pairs of Point-Pixel Pairs for RGB-D Scene Understanding [24.93545970229774]
We propose contrasting "pairs of point-pixel pairs", where positives include pairs of RGB-D points in correspondence, and negatives include pairs where one of the two modalities has been disturbed. This provides extra flexibility in making hard negatives and helps networks to learn features from both modalities, not just the more discriminating one of the two.
arXiv Detail & Related papers (2020-12-24T04:00:52Z)
Bi-directional Cross-Modality Feature Propagation with Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation. Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion. In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z)
Synergistic saliency and depth prediction for RGB-D saliency detection [76.27406945671379]
Existing RGB-D saliency datasets are small, which may lead to overfitting and limited generalization for diverse scenarios. We propose a semi-supervised system for RGB-D saliency detection that can be trained on smaller RGB-D saliency datasets without saliency ground truth.
arXiv Detail & Related papers (2020-07-03T14:24:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.