Test-Time Training on Video Streams
- URL: http://arxiv.org/abs/2307.05014v2
- Date: Wed, 12 Jul 2023 04:19:48 GMT
- Title: Test-Time Training on Video Streams
- Authors: Renhao Wang, Yu Sun, Yossi Gandelsman, Xinlei Chen, Alexei A. Efros,
Xiaolong Wang
- Abstract summary: Prior work has established test-time training (TTT) as a general framework to further improve a trained model at test time.
We extend TTT to the streaming setting, where multiple test instances arrive in temporal order.
Online TTT significantly outperforms the fixed-model baseline for four tasks, on three real-world datasets.
- Score: 54.07009446207442
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prior work has established test-time training (TTT) as a general framework to
further improve a trained model at test time. Before making a prediction on
each test instance, the model is trained on the same instance using a
self-supervised task, such as image reconstruction with masked autoencoders. We
extend TTT to the streaming setting, where multiple test instances - video
frames in our case - arrive in temporal order. Our extension is online TTT: The
current model is initialized from the previous model, then trained on the
current frame and a small window of frames immediately before. Online TTT
significantly outperforms the fixed-model baseline for four tasks, on three
real-world datasets. The relative improvement is 45% and 66% for instance and
panoptic segmentation. Surprisingly, online TTT also outperforms its offline
variant that accesses more information, training on all frames from the entire
test video regardless of temporal order. This differs from previous findings
using synthetic videos. We conceptualize locality as the advantage of online
over offline TTT. We analyze the role of locality with ablations and a theory
based on bias-variance trade-off.
Related papers
- NC-TTT: A Noise Contrastive Approach for Test-Time Training [19.0284321951354]
Noise-Contrastive Test-Time Training (NC-TTT) is a novel unsupervised TTT technique based on the discrimination of noisy feature maps.
By learning to classify noisy views of projected feature maps, and then adapting the model accordingly on new domains, classification performance can be recovered by an important margin.
arXiv Detail & Related papers (2024-04-12T10:54:11Z) - Depth-aware Test-Time Training for Zero-shot Video Object Segmentation [48.2238806766877]
We introduce a test-time training (TTT) strategy to address the problem of generalization to unseen videos.
Our key insight is to enforce the model to predict consistent depth during the TTT process.
Our proposed video TTT strategy provides significant superiority over state-of-the-art TTT methods.
arXiv Detail & Related papers (2024-03-07T06:40:53Z) - Technical Report for ICCV 2023 Visual Continual Learning Challenge:
Continuous Test-time Adaptation for Semantic Segmentation [18.299549256484887]
The goal of the challenge is to develop a test-time adaptation (TTA) method, which could adapt the model to gradually changing domains in video sequences for semantic segmentation task.
The TTA methods are evaluated in each image sequence (video) separately, meaning the model is reset to the source model state before the next sequence.
The proposed solution secured a 3rd place in a challenge and received an innovation award.
arXiv Detail & Related papers (2023-10-20T14:20:21Z) - Transform-Equivariant Consistency Learning for Temporal Sentence
Grounding [66.10949751429781]
We introduce a novel Equivariant Consistency Regulation Learning framework to learn more discriminative representations for each video.
Our motivation comes from that the temporal boundary of the query-guided activity should be consistently predicted.
In particular, we devise a self-supervised consistency loss module to enhance the completeness and smoothness of the augmented video.
arXiv Detail & Related papers (2023-05-06T19:29:28Z) - Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language
Models [107.05966685291067]
We propose test-time prompt tuning (TPT) to learn adaptive prompts on the fly with a single test sample.
TPT improves the zero-shot top-1 accuracy of CLIP by 3.6% on average.
In evaluating cross-dataset generalization with unseen categories, TPT performs on par with the state-of-the-art approaches that use additional training data.
arXiv Detail & Related papers (2022-09-15T17:55:11Z) - Long-Short Temporal Contrastive Learning of Video Transformers [62.71874976426988]
Self-supervised pretraining of video transformers on video-only datasets can lead to action recognition results on par or better than those obtained with supervised pretraining on large-scale image datasets.
Our approach, named Long-Short Temporal Contrastive Learning, enables video transformers to learn an effective clip-level representation by predicting temporal context captured from a longer temporal extent.
arXiv Detail & Related papers (2021-06-17T02:30:26Z) - Dense Regression Network for Video Grounding [97.57178850020327]
We use the distances between the frame within the ground truth and the starting (ending) frame as dense supervisions to improve the video grounding accuracy.
Specifically, we design a novel dense regression network (DRN) to regress the distances from each frame to the starting (ending) frame of the video segment.
We also propose a simple but effective IoU regression head module to explicitly consider the localization quality of the grounding results.
arXiv Detail & Related papers (2020-04-07T17:15:37Z) - Temporally Coherent Embeddings for Self-Supervised Video Representation
Learning [2.216657815393579]
This paper presents TCE: Temporally Coherent Embeddings for self-supervised video representation learning.
The proposed method exploits inherent structure of unlabeled video data to explicitly enforce temporal coherency in the embedding space.
With a simple but effective 2D-CNN backbone and only RGB stream inputs, TCE pre-trained representations outperform all previous selfsupervised 2D-CNN and 3D-CNN pre-trained on UCF101.
arXiv Detail & Related papers (2020-03-21T12:25:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.