Dual Domain-Adversarial Learning for Audio-Visual Saliency Prediction
- URL: http://arxiv.org/abs/2208.05220v1
- Date: Wed, 10 Aug 2022 08:50:32 GMT
- Title: Dual Domain-Adversarial Learning for Audio-Visual Saliency Prediction
- Authors: Yingzi Fan, Longfei Han, Yue Zhang, Lechao Cheng, Chen Xia, Di Hu
- Abstract summary: Deep convolution neural networks (CNN) showcase strong capacity in coping with the audio-visual saliency prediction task.
Due to various factors such as shooting scenes and weather, there often exists moderate distribution discrepancy between source training data and target testing data.
We propose a dual domain-adversarial learning algorithm to mitigate the domain discrepancy between source and target data.
- Score: 17.691475370621
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Both visual and auditory information are valuable to determine the salient
regions in videos. Deep convolution neural networks (CNN) showcase strong
capacity in coping with the audio-visual saliency prediction task. Due to
various factors such as shooting scenes and weather, there often exists
moderate distribution discrepancy between source training data and target
testing data. The domain discrepancy induces to performance degradation on
target testing data for CNN models. This paper makes an early attempt to tackle
the unsupervised domain adaptation problem for audio-visual saliency
prediction. We propose a dual domain-adversarial learning algorithm to mitigate
the domain discrepancy between source and target data. First, a specific domain
discrimination branch is built up for aligning the auditory feature
distributions. Then, those auditory features are fused into the visual features
through a cross-modal self-attention module. The other domain discrimination
branch is devised to reduce the domain discrepancy of visual features and
audio-visual correlations implied by the fused audio-visual features.
Experiments on public benchmarks demonstrate that our method can relieve the
performance degradation caused by domain discrepancy.
Related papers
- Audio-based Kinship Verification Using Age Domain Conversion [39.4890403254022]
Key challenge in the task arises from differences in age across samples from different individuals.
We utilise the optimised CycleGAN-VC3 network to perform age-audio conversion to generate the in-domain audio.
The generated audio dataset is employed to extract a range of features, which are then fed into a metric learning architecture to verify kinship.
arXiv Detail & Related papers (2024-10-14T22:08:57Z) - From Denoising Training to Test-Time Adaptation: Enhancing Domain
Generalization for Medical Image Segmentation [8.36463803956324]
We propose the Denoising Y-Net (DeY-Net), a novel approach incorporating an auxiliary denoising decoder into the basic U-Net architecture.
The auxiliary decoder aims to perform denoising training, augmenting the domain-invariant representation that facilitates domain generalization.
Building upon denoising training, we propose Denoising Test Time Adaptation (DeTTA) that further: (i) adapts the model to the target domain in a sample-wise manner, and (ii) adapts to the noise-corrupted input.
arXiv Detail & Related papers (2023-10-31T08:39:15Z) - Incorporating Pre-training Data Matters in Unsupervised Domain
Adaptation [13.509286043322442]
Unsupervised domain adaptation (UDA) and Source-free UDA(SFUDA) methods formulate the problem involving two domains: source and target.
We investigate the correlation among ImageNet, the source, and the target domain.
We present a novel framework TriDA which preserves the semantic structure of the pre-train dataset during fine-tuning.
arXiv Detail & Related papers (2023-08-06T12:23:40Z) - Variational Counterfactual Prediction under Runtime Domain Corruption [50.89405221574912]
Co-occurrence of domain shift and inaccessible variables runtime domain corruption seriously impairs generalizability of trained counterfactual predictor.
We build an adversarially unified variational causal effect model, named VEGAN, with a novel two-stage adversarial domain adaptation scheme.
We demonstrate that VEGAN outperforms other state-of-the-art baselines on individual-level treatment effect estimation in the presence of runtime domain corruption.
arXiv Detail & Related papers (2023-06-23T02:54:34Z) - Adaptive Face Recognition Using Adversarial Information Network [57.29464116557734]
Face recognition models often degenerate when training data are different from testing data.
We propose a novel adversarial information network (AIN) to address it.
arXiv Detail & Related papers (2023-05-23T02:14:11Z) - DecoupleNet: Decoupled Network for Domain Adaptive Semantic Segmentation [78.30720731968135]
Unsupervised domain adaptation in semantic segmentation has been raised to alleviate the reliance on expensive pixel-wise annotations.
We propose DecoupleNet that alleviates source domain overfitting and enables the final model to focus more on the segmentation task.
We also put forward Self-Discrimination (SD) and introduce an auxiliary classifier to learn more discriminative target domain features with pseudo labels.
arXiv Detail & Related papers (2022-07-20T15:47:34Z) - Frequency Spectrum Augmentation Consistency for Domain Adaptive Object
Detection [107.52026281057343]
We introduce a Frequency Spectrum Augmentation Consistency (FSAC) framework with four different low-frequency filter operations.
In the first stage, we utilize all the original and augmented source data to train an object detector.
In the second stage, augmented source and target data with pseudo labels are adopted to perform the self-training for prediction consistency.
arXiv Detail & Related papers (2021-12-16T04:07:01Z) - TASK3 DCASE2021 Challenge: Sound event localization and detection using
squeeze-excitation residual CNNs [4.4973334555746]
This study is based on the one carried out by the same team last year.
It has been decided to study how this technique improves each of the datasets.
This modification shows an improvement in the performance of the system compared to the baseline using MIC dataset.
arXiv Detail & Related papers (2021-07-30T11:34:15Z) - Cross-domain Adaptation with Discrepancy Minimization for
Text-independent Forensic Speaker Verification [61.54074498090374]
This study introduces a CRSS-Forensics audio dataset collected in multiple acoustic environments.
We pre-train a CNN-based network using the VoxCeleb data, followed by an approach which fine-tunes part of the high-level network layers with clean speech from CRSS-Forensics.
arXiv Detail & Related papers (2020-09-05T02:54:33Z) - Domain-aware Visual Bias Eliminating for Generalized Zero-Shot Learning [150.42959029611657]
Domain-aware Visual Bias Eliminating (DVBE) network constructs two complementary visual representations.
For unseen images, we automatically search an optimal semantic-visual alignment architecture.
arXiv Detail & Related papers (2020-03-30T08:17:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.