Test-time Adaptation with Slot-Centric Models
- URL: http://arxiv.org/abs/2203.11194v3
- Date: Tue, 27 Jun 2023 19:41:35 GMT
- Title: Test-time Adaptation with Slot-Centric Models
- Authors: Mihir Prabhudesai, Anirudh Goyal, Sujoy Paul, Sjoerd van Steenkiste,
Mehdi S. M. Sajjadi, Gaurav Aggarwal, Thomas Kipf, Deepak Pathak, Katerina
Fragkiadaki
- Abstract summary: Slot-TTA is a semi-supervised scene decomposition model that at test time is adapted per scene through gradient descent on reconstruction or cross-view synthesis objectives.
We show substantial out-of-distribution performance improvements against state-of-the-art supervised feed-forward detectors, and alternative test-time adaptation methods.
- Score: 63.981055778098444
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current visual detectors, though impressive within their training
distribution, often fail to parse out-of-distribution scenes into their
constituent entities. Recent test-time adaptation methods use auxiliary
self-supervised losses to adapt the network parameters to each test example
independently and have shown promising results towards generalization outside
the training distribution for the task of image classification. In our work, we
find evidence that these losses are insufficient for the task of scene
decomposition, without also considering architectural inductive biases. Recent
slot-centric generative models attempt to decompose scenes into entities in a
self-supervised manner by reconstructing pixels. Drawing upon these two lines
of work, we propose Slot-TTA, a semi-supervised slot-centric scene
decomposition model that at test time is adapted per scene through gradient
descent on reconstruction or cross-view synthesis objectives. We evaluate
Slot-TTA across multiple input modalities, images or 3D point clouds, and show
substantial out-of-distribution performance improvements against
state-of-the-art supervised feed-forward detectors, and alternative test-time
adaptation methods.
Related papers
- SINDER: Repairing the Singular Defects of DINOv2 [61.98878352956125]
Vision Transformer models trained on large-scale datasets often exhibit artifacts in the patch token they extract.
We propose a novel fine-tuning smooth regularization that rectifies structural deficiencies using only a small dataset.
arXiv Detail & Related papers (2024-07-23T20:34:23Z) - GeneralAD: Anomaly Detection Across Domains by Attending to Distorted Features [68.14842693208465]
GeneralAD is an anomaly detection framework designed to operate in semantic, near-distribution, and industrial settings.
We propose a novel self-supervised anomaly generation module that employs straightforward operations like noise addition and shuffling to patch features.
We extensively evaluated our approach on ten datasets, achieving state-of-the-art results in six and on-par performance in the remaining.
arXiv Detail & Related papers (2024-07-17T09:27:41Z) - Test-time Distribution Learning Adapter for Cross-modal Visual Reasoning [16.998833621046117]
We propose Test-Time Distribution LearNing Adapter (TT-DNA) which directly works during the testing period.
Specifically, we estimate Gaussian distributions to model visual features of the few-shot support images to capture the knowledge from the support set.
Our extensive experimental results on visual reasoning for human object interaction demonstrate that our proposed TT-DNA outperforms existing state-of-the-art methods by large margins.
arXiv Detail & Related papers (2024-03-10T01:34:45Z) - Generalizable Industrial Visual Anomaly Detection with Self-Induction
Vision Transformer [5.116033262865781]
We propose a self-induction vision Transformer (SIVT) for unsupervised generalizable industrial visual anomaly detection and localization.
The proposed SIVT first extracts discriminatory features from pre-trained CNN as property descriptors, then reconstructs the extracted features in a self-supervisory fashion.
The results reveal that the proposed method can advance state-of-the-art detection performance with an improvement of 2.8-6.3 in AUROC, and 3.3-7.6 in AP.
arXiv Detail & Related papers (2022-11-22T14:56:12Z) - Self-Supervised Training with Autoencoders for Visual Anomaly Detection [61.62861063776813]
We focus on a specific use case in anomaly detection where the distribution of normal samples is supported by a lower-dimensional manifold.
We adapt a self-supervised learning regime that exploits discriminative information during training but focuses on the submanifold of normal examples.
We achieve a new state-of-the-art result on the MVTec AD dataset -- a challenging benchmark for visual anomaly detection in the manufacturing domain.
arXiv Detail & Related papers (2022-06-23T14:16:30Z) - TTAPS: Test-Time Adaption by Aligning Prototypes using Self-Supervision [70.05605071885914]
We propose a novel modification of the self-supervised training algorithm SwAV that adds the ability to adapt to single test samples.
We show the success of our method on the common benchmark dataset CIFAR10-C.
arXiv Detail & Related papers (2022-05-18T05:43:06Z) - Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose
Estimation [70.32536356351706]
We introduce MRP-Net that constitutes a common deep network backbone with two output heads subscribing to two diverse configurations.
We derive suitable measures to quantify prediction uncertainty at both pose and joint level.
We present a comprehensive evaluation of the proposed approach and demonstrate state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2022-03-29T07:14:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.