Hierarchical Domain-Adapted Feature Learning for Video Saliency
Prediction
- URL: http://arxiv.org/abs/2010.01220v4
- Date: Thu, 6 May 2021 08:09:36 GMT
- Title: Hierarchical Domain-Adapted Feature Learning for Video Saliency
Prediction
- Authors: Giovanni Bellitto, Federica Proietto Salanitri, Simone Palazzo,
Francesco Rundo, Daniela Giordano, Concetto Spampinato
- Abstract summary: We propose a 3D fully convolutional architecture for video saliency prediction.
We provide the base hierarchical learning mechanism with two techniques for domain adaptation and domain-specific learning.
The results of our experiments show that the proposed model yields state-of-the-art accuracy on supervised saliency prediction.
- Score: 15.270499225813841
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we propose a 3D fully convolutional architecture for video
saliency prediction that employs hierarchical supervision on intermediate maps
(referred to as conspicuity maps) generated using features extracted at
different abstraction levels. We provide the base hierarchical learning
mechanism with two techniques for domain adaptation and domain-specific
learning. For the former, we encourage the model to unsupervisedly learn
hierarchical general features using gradient reversal at multiple scales, to
enhance generalization capabilities on datasets for which no annotations are
provided during training. As for domain specialization, we employ
domain-specific operations (namely, priors, smoothing and batch normalization)
by specializing the learned features on individual datasets in order to
maximize performance. The results of our experiments show that the proposed
model yields state-of-the-art accuracy on supervised saliency prediction. When
the base hierarchical model is empowered with domain-specific modules,
performance improves, outperforming state-of-the-art models on three out of
five metrics on the DHF1K benchmark and reaching the second-best results on the
other two. When, instead, we test it in an unsupervised domain adaptation
setting, by enabling hierarchical gradient reversal layers, we obtain
performance comparable to supervised state-of-the-art.
Related papers
- Universal Pooling Method of Multi-layer Features from Pretrained Models for Speaker Verification [7.005068872406135]
Recent advancements in automatic speaker verification (ASV) studies have been achieved by leveraging large-scale pretrained networks.
We present a novel approach for exploiting the multilayered nature of pretrained models for ASV.
We show how the proposed interlayer processing aids in maximizing the advantage of utilizing pretrained models.
arXiv Detail & Related papers (2024-09-12T05:55:32Z) - RADA: Robust and Accurate Feature Learning with Domain Adaptation [7.905594146253435]
We introduce a multi-level feature aggregation network that incorporates two pivotal components to facilitate the learning of robust and accurate features.
Our method, RADA, achieves excellent results in image matching, camera pose estimation, and visual localization tasks.
arXiv Detail & Related papers (2024-07-22T16:49:58Z) - Self-supervised Learning of Dense Hierarchical Representations for Medical Image Segmentation [2.2265038612930663]
This paper demonstrates a self-supervised framework for learning voxel-wise coarse-to-fine representations tailored for dense downstream tasks.
We devise a training strategy that balances the contributions of features from multiple scales, ensuring that the learned representations capture both coarse and fine-grained details.
arXiv Detail & Related papers (2024-01-12T09:47:17Z) - Skeleton2vec: A Self-supervised Learning Framework with Contextualized
Target Representations for Skeleton Sequence [56.092059713922744]
We show that using high-level contextualized features as prediction targets can achieve superior performance.
Specifically, we propose Skeleton2vec, a simple and efficient self-supervised 3D action representation learning framework.
Our proposed Skeleton2vec outperforms previous methods and achieves state-of-the-art results.
arXiv Detail & Related papers (2024-01-01T12:08:35Z) - Learning to Augment via Implicit Differentiation for Domain
Generalization [107.9666735637355]
Domain generalization (DG) aims to overcome the problem by leveraging multiple source domains to learn a domain-generalizable model.
In this paper, we propose a novel augmentation-based DG approach, dubbed AugLearn.
AugLearn shows effectiveness on three standard DG benchmarks, PACS, Office-Home and Digits-DG.
arXiv Detail & Related papers (2022-10-25T18:51:51Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - Domain Generalisation for Object Detection under Covariate and Concept Shift [10.32461766065764]
Domain generalisation aims to promote the learning of domain-invariant features while suppressing domain-specific features.
An approach to domain generalisation for object detection is proposed, the first such approach applicable to any object detection architecture.
arXiv Detail & Related papers (2022-03-10T11:14:18Z) - Unsupervised Domain Adaptation for Semantic Segmentation via Low-level
Edge Information Transfer [27.64947077788111]
Unsupervised domain adaptation for semantic segmentation aims to make models trained on synthetic data adapt to real images.
Previous feature-level adversarial learning methods only consider adapting models on the high-level semantic features.
We present the first attempt at explicitly using low-level edge information, which has a small inter-domain gap, to guide the transfer of semantic information.
arXiv Detail & Related papers (2021-09-18T11:51:31Z) - Semi-Supervised Domain Generalization with Stochastic StyleMatch [90.98288822165482]
In real-world applications, we might have only a few labels available from each source domain due to high annotation cost.
In this work, we investigate semi-supervised domain generalization, a more realistic and practical setting.
Our proposed approach, StyleMatch, is inspired by FixMatch, a state-of-the-art semi-supervised learning method based on pseudo-labeling.
arXiv Detail & Related papers (2021-06-01T16:00:08Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z) - Adversarial Bipartite Graph Learning for Video Domain Adaptation [50.68420708387015]
Domain adaptation techniques, which focus on adapting models between distributionally different domains, are rarely explored in the video recognition area.
Recent works on visual domain adaptation which leverage adversarial learning to unify the source and target video representations are not highly effective on the videos.
This paper proposes an Adversarial Bipartite Graph (ABG) learning framework which directly models the source-target interactions.
arXiv Detail & Related papers (2020-07-31T03:48:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.