RT-GAN: Recurrent Temporal GAN for Adding Lightweight Temporal
Consistency to Frame-Based Domain Translation Approaches
- URL: http://arxiv.org/abs/2310.00868v1
- Date: Mon, 2 Oct 2023 03:13:26 GMT
- Title: RT-GAN: Recurrent Temporal GAN for Adding Lightweight Temporal
Consistency to Frame-Based Domain Translation Approaches
- Authors: Shawn Mathew, Saad Nadeem, Alvin C. Goh, and Arie Kaufman
- Abstract summary: We present a lightweight solution with a tunable temporal parameter, RT-GAN, for adding temporal consistency to individual frame-based approaches.
We demonstrate the effectiveness of our approach on two challenging use cases in colonoscopy.
- Score: 3.7873597471903944
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While developing new unsupervised domain translation methods for endoscopy
videos, it is typical to start with approaches that initially work for
individual frames without temporal consistency. Once an individual-frame model
has been finalized, additional contiguous frames are added with a modified deep
learning architecture to train a new model for temporal consistency. This
transition to temporally-consistent deep learning models, however, requires
significantly more computational and memory resources for training. In this
paper, we present a lightweight solution with a tunable temporal parameter,
RT-GAN (Recurrent Temporal GAN), for adding temporal consistency to individual
frame-based approaches that reduces training requirements by a factor of 5. We
demonstrate the effectiveness of our approach on two challenging use cases in
colonoscopy: haustral fold segmentation (indicative of missed surface) and
realistic colonoscopy simulator video generation. The datasets, accompanying
code, and pretrained models will be made available at
\url{https://github.com/nadeemlab/CEP}.
Related papers
- WinTSR: A Windowed Temporal Saliency Rescaling Method for Interpreting Time Series Deep Learning Models [0.51795041186793]
We introduce a novel interpretation method called Windowed Temporal Saliency Rescaling (WinTSR)
We benchmark WinTSR against 10 recent interpretation techniques with 5 state-of-the-art deep-learning models of different architectures.
Our comprehensive analysis shows that WinTSR significantly outranks the other local interpretation methods in overall performance.
arXiv Detail & Related papers (2024-12-05T17:15:07Z) - STLight: a Fully Convolutional Approach for Efficient Predictive Learning by Spatio-Temporal joint Processing [6.872340834265972]
We propose STLight, a novel method for S-temporal learning that relies solely on channel-wise and depth-wise convolutions as learnable layers.
STLight overcomes the limitations of traditional convolutional approaches by rearranging spatial and temporal dimensions together.
Our architecture achieves state-of-the-art performance on STL benchmarks across datasets and settings, while significantly improving computational efficiency in terms of parameters and computational FLOPs.
arXiv Detail & Related papers (2024-11-15T13:53:19Z) - Cross Space and Time: A Spatio-Temporal Unitized Model for Traffic Flow Forecasting [16.782154479264126]
Predicting backbone-temporal traffic flow presents challenges due to complex interactions between temporal factors.
Existing approaches address these dimensions in isolation, neglecting their critical interdependencies.
In this paper, we introduce Sanonymous-Temporal Unitized Unitized Cell (ASTUC), a unified framework designed to capture both spatial and temporal dependencies.
arXiv Detail & Related papers (2024-11-14T07:34:31Z) - Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion [57.232688209606515]
We present HTCL, a novel Temporal Temporal Context Learning paradigm for improving camera-based semantic scene completion.
Our method ranks $1st$ on the Semantic KITTI benchmark and even surpasses LiDAR-based methods in terms of mIoU.
arXiv Detail & Related papers (2024-07-02T09:11:17Z) - Self-STORM: Deep Unrolled Self-Supervised Learning for Super-Resolution Microscopy [55.2480439325792]
We introduce deep unrolled self-supervised learning, which alleviates the need for such data by training a sequence-specific, model-based autoencoder.
Our proposed method exceeds the performance of its supervised counterparts.
arXiv Detail & Related papers (2024-03-25T17:40:32Z) - Time-series Generation by Contrastive Imitation [87.51882102248395]
We study a generative framework that seeks to combine the strengths of both: Motivated by a moment-matching objective to mitigate compounding error, we optimize a local (but forward-looking) transition policy.
At inference, the learned policy serves as the generator for iterative sampling, and the learned energy serves as a trajectory-level measure for evaluating sample quality.
arXiv Detail & Related papers (2023-11-02T16:45:25Z) - Disentangling Spatial and Temporal Learning for Efficient Image-to-Video
Transfer Learning [59.26623999209235]
We present DiST, which disentangles the learning of spatial and temporal aspects of videos.
The disentangled learning in DiST is highly efficient because it avoids the back-propagation of massive pre-trained parameters.
Extensive experiments on five benchmarks show that DiST delivers better performance than existing state-of-the-art methods by convincing gaps.
arXiv Detail & Related papers (2023-09-14T17:58:33Z) - Transform-Equivariant Consistency Learning for Temporal Sentence
Grounding [66.10949751429781]
We introduce a novel Equivariant Consistency Regulation Learning framework to learn more discriminative representations for each video.
Our motivation comes from that the temporal boundary of the query-guided activity should be consistently predicted.
In particular, we devise a self-supervised consistency loss module to enhance the completeness and smoothness of the augmented video.
arXiv Detail & Related papers (2023-05-06T19:29:28Z) - StyleVideoGAN: A Temporal Generative Model using a Pretrained StyleGAN [70.31913835035206]
We present a novel approach to the video synthesis problem that helps to greatly improve visual quality.
We make use of a pre-trained StyleGAN network, the latent space of which allows control over the appearance of the objects it was trained for.
Our temporal architecture is then trained not on sequences of RGB frames, but on sequences of StyleGAN latent codes.
arXiv Detail & Related papers (2021-07-15T09:58:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.