Fourier Test-time Adaptation with Multi-level Consistency for Robust
Classification
- URL: http://arxiv.org/abs/2306.02544v1
- Date: Mon, 5 Jun 2023 02:29:38 GMT
- Title: Fourier Test-time Adaptation with Multi-level Consistency for Robust
Classification
- Authors: Yuhao Huang, Xin Yang, Xiaoqiong Huang, Xinrui Zhou, Haozhe Chi,
Haoran Dou, Xindi Hu, Jian Wang, Xuedong Deng, Dong Ni
- Abstract summary: We propose a novel approach called Fourier Test-time Adaptation (FTTA) to integrate input and model tuning.
FTTA builds a reliable multi-level consistency measurement of paired inputs for achieving self-supervised of prediction.
It was extensively validated on three large classification datasets with different modalities and organs.
- Score: 10.291631977766672
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep classifiers may encounter significant performance degradation when
processing unseen testing data from varying centers, vendors, and protocols.
Ensuring the robustness of deep models against these domain shifts is crucial
for their widespread clinical application. In this study, we propose a novel
approach called Fourier Test-time Adaptation (FTTA), which employs a
dual-adaptation design to integrate input and model tuning, thereby jointly
improving the model robustness. The main idea of FTTA is to build a reliable
multi-level consistency measurement of paired inputs for achieving
self-correction of prediction. Our contribution is two-fold. First, we
encourage consistency in global features and local attention maps between the
two transformed images of the same input. Here, the transformation refers to
Fourier-based input adaptation, which can transfer one unseen image into source
style to reduce the domain gap. Furthermore, we leverage style-interpolated
images to enhance the global and local features with learnable parameters,
which can smooth the consistency measurement and accelerate convergence.
Second, we introduce a regularization technique that utilizes style
interpolation consistency in the frequency space to encourage self-consistency
in the logit space of the model output. This regularization provides strong
self-supervised signals for robustness enhancement. FTTA was extensively
validated on three large classification datasets with different modalities and
organs. Experimental results show that FTTA is general and outperforms other
strong state-of-the-art methods.
Related papers
- Visual Fourier Prompt Tuning [63.66866445034855]
We propose the Visual Fourier Prompt Tuning (VFPT) method as a general and effective solution for adapting large-scale transformer-based models.
Our approach incorporates the Fast Fourier Transform into prompt embeddings and harmoniously considers both spatial and frequency domain information.
Our results demonstrate that our approach outperforms current state-of-the-art baselines on two benchmarks.
arXiv Detail & Related papers (2024-11-02T18:18:35Z) - Fully Differentiable Correlation-driven 2D/3D Registration for X-ray to CT Image Fusion [3.868072865207522]
Image-based rigid 2D/3D registration is a critical technique for fluoroscopic guided surgical interventions.
We propose a novel fully differentiable correlation-driven network using a dual-branch CNN-transformer encoder.
A correlation-driven loss is proposed for low-frequency feature and high-frequency feature decomposition based on embedded information.
arXiv Detail & Related papers (2024-02-04T14:12:51Z) - Generalized Face Forgery Detection via Adaptive Learning for Pre-trained Vision Transformer [54.32283739486781]
We present a textbfForgery-aware textbfAdaptive textbfVision textbfTransformer (FA-ViT) under the adaptive learning paradigm.
FA-ViT achieves 93.83% and 78.32% AUC scores on Celeb-DF and DFDC datasets in the cross-dataset evaluation.
arXiv Detail & Related papers (2023-09-20T06:51:11Z) - Consistency Regularization for Generalizable Source-free Domain
Adaptation [62.654883736925456]
Source-free domain adaptation (SFDA) aims to adapt a well-trained source model to an unlabelled target domain without accessing the source dataset.
Existing SFDA methods ONLY assess their adapted models on the target training set, neglecting the data from unseen but identically distributed testing sets.
We propose a consistency regularization framework to develop a more generalizable SFDA method.
arXiv Detail & Related papers (2023-08-03T07:45:53Z) - Improving Transformer-based Image Matching by Cascaded Capturing
Spatially Informative Keypoints [44.90917854990362]
We propose a transformer-based cascade matching model -- Cascade feature Matching TRansformer (CasMTR)
We use a simple yet effective Non-Maximum Suppression (NMS) post-process to filter keypoints through the confidence map.
CasMTR achieves state-of-the-art performance in indoor and outdoor pose estimation as well as visual localization.
arXiv Detail & Related papers (2023-03-06T04:32:34Z) - Disentangled Federated Learning for Tackling Attributes Skew via
Invariant Aggregation and Diversity Transferring [104.19414150171472]
Attributes skews the current federated learning (FL) frameworks from consistent optimization directions among the clients.
We propose disentangled federated learning (DFL) to disentangle the domain-specific and cross-invariant attributes into two complementary branches.
Experiments verify that DFL facilitates FL with higher performance, better interpretability, and faster convergence rate, compared with SOTA FL methods.
arXiv Detail & Related papers (2022-06-14T13:12:12Z) - FAMLP: A Frequency-Aware MLP-Like Architecture For Domain Generalization [73.41395947275473]
We propose a novel frequency-aware architecture, in which the domain-specific features are filtered out in the transformed frequency domain.
Experiments on three benchmarks demonstrate significant performance, outperforming the state-of-the-art methods by a margin of 3%, 4% and 9%, respectively.
arXiv Detail & Related papers (2022-03-24T07:26:29Z) - CSformer: Bridging Convolution and Transformer for Compressive Sensing [65.22377493627687]
This paper proposes a hybrid framework that integrates the advantages of leveraging detailed spatial information from CNN and the global context provided by transformer for enhanced representation learning.
The proposed approach is an end-to-end compressive image sensing method, composed of adaptive sampling and recovery.
The experimental results demonstrate the effectiveness of the dedicated transformer-based architecture for compressive sensing.
arXiv Detail & Related papers (2021-12-31T04:37:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.