Related papers: ReFrame: Layer Caching for Accelerated Inference in Real-Time Rendering

ReFrame: Layer Caching for Accelerated Inference in Real-Time Rendering

URL: http://arxiv.org/abs/2506.13814v1
Date: Sat, 14 Jun 2025 20:17:43 GMT
Title: ReFrame: Layer Caching for Accelerated Inference in Real-Time Rendering
Authors: Lufei Liu, Tor M. Aamodt,
Abstract summary: ReFrame explores different caching policies to optimize trade-offs between quality and performance in rendering workloads.<n>We achieve 1.4x speedup on average with negligible quality loss in three real-time rendering tasks.
Score: 11.260625620980553
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Graphics rendering applications increasingly leverage neural networks in tasks such as denoising, supersampling, and frame extrapolation to improve image quality while maintaining frame rates. The temporal coherence inherent in these tasks presents an opportunity to reuse intermediate results from previous frames and avoid redundant computations. Recent work has shown that caching intermediate features to be reused in subsequent inferences is an effective method to reduce latency in diffusion models. We extend this idea to real-time rendering and present ReFrame, which explores different caching policies to optimize trade-offs between quality and performance in rendering workloads. ReFrame can be applied to a variety of encoder-decoder style networks commonly found in rendering pipelines. Experimental results show that we achieve 1.4x speedup on average with negligible quality loss in three real-time rendering tasks. Code available: https://ubc-aamodt-group.github.io/reframe-layer-caching/

Related papers

FastCar: Cache Attentive Replay for Fast Auto-Regressive Video Generation on the Edge [60.000984252907195]
Auto-regressive (AR) models have recently shown promise in visual generation tasks due to their superior sampling efficiency.<n>Video generation requires a substantially larger number of tokens to produce coherent temporal frames, resulting in significant overhead during the decoding phase.<n>We propose the textbfFastCar framework to accelerate the decode phase for the AR video generation by exploring the temporal redundancy.
arXiv Detail & Related papers (2025-05-17T05:00:39Z)
QuantCache: Adaptive Importance-Guided Quantization with Hierarchical Latent and Layer Caching for Video Generation [84.91431271257437]
Diffusion Transformers (DiTs) have emerged as a dominant architecture in video generation.<n>DiTs come with significant drawbacks, including increased computational and memory costs.<n>We propose QuantCache, a novel training-free inference acceleration framework.
arXiv Detail & Related papers (2025-03-09T10:31:51Z)
MaskVD: Region Masking for Efficient Video Object Detection [11.759503235646696]
Video tasks are compute-heavy and pose a challenge when deploying in real-time applications. This paper presents a strategy for masking regions in video frames. By leveraging extracted features from previous frames, ViT backbones directly benefit from region masking.
arXiv Detail & Related papers (2024-07-16T08:01:49Z)
Frame Flexible Network [52.623337134518835]
Existing video recognition algorithms always conduct different training pipelines for inputs with different frame numbers. If we evaluate the model using other frames which are not used in training, we observe the performance will drop significantly. We propose a general framework, named Frame Flexible Network (FFN), which enables the model to be evaluated at different frames to adjust its computation.
arXiv Detail & Related papers (2023-03-26T20:51:35Z)
ReBotNet: Fast Real-time Video Enhancement [59.08038313427057]
Most restoration networks are slow, have high computational bottleneck, and can't be used for real-time video enhancement. In this work, we design an efficient and fast framework to perform real-time enhancement for practical use-cases like live video calls and video streams. To evaluate our method, we emulate two new datasets that real-world video call and streaming scenarios, and show extensive results on multiple datasets where ReBotNet outperforms existing approaches with lower computations, reduced memory requirements, and faster inference time.
arXiv Detail & Related papers (2023-03-23T17:58:05Z)
Scaling Neural Face Synthesis to High FPS and Low Latency by Neural Caching [12.362614824541824]
Recent neural rendering approaches greatly improve image quality, reaching near photorealism. The underlying neural networks have high runtime, precluding telepresence and virtual reality applications that require high resolution at low latency. We break this dependency by caching information from the previous frame to speed up the processing of the current one with an implicit warp. We test the approach on view-dependent rendering of 3D portrait avatars, as needed for telepresence, on established benchmark sequences.
arXiv Detail & Related papers (2022-11-10T18:58:00Z)
Distortion-Aware Network Pruning and Feature Reuse for Real-time Video Segmentation [49.17930380106643]
We propose a novel framework to speed up any architecture with skip-connections for real-time vision tasks. Specifically, at the arrival of each frame, we transform the features from the previous frame to reuse them at specific spatial bins. We then perform partial computation of the backbone network on the regions of the current frame that captures temporal differences between the current and previous frame.
arXiv Detail & Related papers (2022-06-20T07:20:02Z)
Efficient Semantic Video Segmentation with Per-frame Inference [117.97423110566963]
In this work, we process efficient semantic video segmentation in a per-frame fashion during the inference process. We employ compact models for real-time execution. To narrow the performance gap between compact models and large models, new knowledge distillation methods are designed.
arXiv Detail & Related papers (2020-02-26T12:24:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.