AuxAdapt: Stable and Efficient Test-Time Adaptation for Temporally
Consistent Video Semantic Segmentation
- URL: http://arxiv.org/abs/2110.12369v1
- Date: Sun, 24 Oct 2021 07:07:41 GMT
- Title: AuxAdapt: Stable and Efficient Test-Time Adaptation for Temporally
Consistent Video Semantic Segmentation
- Authors: Yizhe Zhang, Shubhankar Borse, Hong Cai, Fatih Porikli
- Abstract summary: In video segmentation, generating temporally consistent results across frames is as important as achieving frame-wise accuracy.
Existing methods rely on optical flow regularization or fine-tuning with test data to attain temporal consistency.
This paper presents an efficient, intuitive, and unsupervised online adaptation method, AuxAdapt, for improving the temporal consistency of most neural network models.
- Score: 81.87943324048756
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In video segmentation, generating temporally consistent results across frames
is as important as achieving frame-wise accuracy. Existing methods rely either
on optical flow regularization or fine-tuning with test data to attain temporal
consistency. However, optical flow is not always avail-able and reliable.
Besides, it is expensive to compute. Fine-tuning the original model in test
time is cost sensitive.
This paper presents an efficient, intuitive, and unsupervised online
adaptation method, AuxAdapt, for improving the temporal consistency of most
neural network models. It does not require optical flow and only takes one pass
of the video. Since inconsistency mainly arises from the model's uncertainty in
its output, we propose an adaptation scheme where the model learns from its own
segmentation decisions as it streams a video, which allows producing more
confident and temporally consistent labeling for similarly-looking pixels
across frames. For stability and efficiency, we leverage a small auxiliary
segmentation network (AuxNet) to assist with this adaptation. More
specifically, AuxNet readjusts the decision of the original segmentation
network (Main-Net) by adding its own estimations to that of MainNet. At every
frame, only AuxNet is updated via back-propagation while keeping MainNet fixed.
We extensively evaluate our test-time adaptation approach on standard video
benchmarks, including Cityscapes, CamVid, and KITTI. The results demonstrate
that our approach provides label-wise accurate, temporally consistent, and
computationally efficient adaptation (5+ folds overhead reduction comparing to
state-of-the-art test-time adaptation methods).
Related papers
- Online Adaptive Disparity Estimation for Dynamic Scenes in Structured
Light Systems [17.53719804060679]
Self-supervised online adaptation has been proposed as a solution to bridge this performance gap.
We propose an unsupervised loss function based on long sequential inputs. It ensures better gradient directions and faster convergence.
Our proposed framework significantly improves the online adaptation speed and achieves superior performance on unseen data.
arXiv Detail & Related papers (2023-10-13T08:00:33Z) - TIDE: Temporally Incremental Disparity Estimation via Pattern Flow in
Structured Light System [17.53719804060679]
TIDE-Net is a learning-based technique for disparity computation in mono-camera structured light systems.
We exploit the deformation of projected patterns (named pattern flow) on captured image sequences to model the temporal information.
For each incoming frame, our model fuses correlation volumes (from current frame) and disparity (from former frame) warped by pattern flow.
arXiv Detail & Related papers (2023-10-13T07:55:33Z) - Temporal Coherent Test-Time Optimization for Robust Video Classification [55.432935503341064]
Deep neural networks are likely to fail when the test data is corrupted in real-world deployment.
Test-time optimization is an effective way that adapts models to robustness to corrupted data during testing.
We propose a framework to utilize temporal information in test-time optimization for robust classification.
arXiv Detail & Related papers (2023-02-28T04:59:23Z) - Adaptive Focus for Efficient Video Recognition [29.615394426035074]
We propose a reinforcement learning based approach for efficient spatially adaptive video recognition (AdaFocus)
A light-weighted ConvNet is first adopted to quickly process the full video sequence, whose features are used by a recurrent policy network to localize the most task-relevant regions.
During offline inference, once the informative patch sequence has been generated, the bulk of computation can be done in parallel, and is efficient on modern GPU devices.
arXiv Detail & Related papers (2021-05-07T13:24:47Z) - Towards Streaming Perception [70.68520310095155]
We present an approach that coherently integrates latency and accuracy into a single metric for real-time online perception.
The key insight behind this metric is to jointly evaluate the output of the entire perception stack at every time instant.
We focus on the illustrative tasks of object detection and instance segmentation in urban video streams, and contribute a novel dataset with high-quality and temporally-dense annotations.
arXiv Detail & Related papers (2020-05-21T01:51:35Z) - Network Adjustment: Channel Search Guided by FLOPs Utilization Ratio [101.84651388520584]
This paper presents a new framework named network adjustment, which considers network accuracy as a function of FLOPs.
Experiments on standard image classification datasets and a wide range of base networks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-06T15:51:00Z) - Scene-Adaptive Video Frame Interpolation via Meta-Learning [54.87696619177496]
We propose to adapt the model to each video by making use of additional information that is readily available at test time.
We obtain significant performance gains with only a single gradient update without any additional parameters.
arXiv Detail & Related papers (2020-04-02T02:46:44Z) - Efficient Semantic Video Segmentation with Per-frame Inference [117.97423110566963]
In this work, we process efficient semantic video segmentation in a per-frame fashion during the inference process.
We employ compact models for real-time execution. To narrow the performance gap between compact models and large models, new knowledge distillation methods are designed.
arXiv Detail & Related papers (2020-02-26T12:24:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.