Scalable Generative Game Engine: Breaking the Resolution Wall via Hardware-Algorithm Co-Design
- URL: http://arxiv.org/abs/2602.00608v1
- Date: Sat, 31 Jan 2026 08:52:51 GMT
- Title: Scalable Generative Game Engine: Breaking the Resolution Wall via Hardware-Algorithm Co-Design
- Authors: Wei Zeng, Xuchen Li, Ruili Feng, Zhen Liu, Fengwei An, Jian Zhao,
- Abstract summary: We bridge the gap between generative models and high-resolution neural simulations by introducing a scalable textitHardware-Algorithm Co-Design framework.<n>Our system delivers fluid 26.4 FPS and 48.3 FPS respectively, with an amortized effective latency of 2.7 ms.
- Score: 17.941176878609337
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Real-time generative game engines represent a paradigm shift in interactive simulation, promising to replace traditional graphics pipelines with neural world models. However, existing approaches are fundamentally constrained by the ``Memory Wall,'' restricting practical deployments to low resolutions (e.g., $64 \times 64$). This paper bridges the gap between generative models and high-resolution neural simulations by introducing a scalable \textit{Hardware-Algorithm Co-Design} framework. We identify that high-resolution generation suffers from a critical resource mismatch: the World Model is compute-bound while the Decoder is memory-bound. To address this, we propose a heterogeneous architecture that intelligently decouples these components across a cluster of AI accelerators. Our system features three core innovations: (1) an asymmetric resource allocation strategy that optimizes throughput under sequence parallelism constraints; (2) a memory-centric operator fusion scheme that minimizes off-chip bandwidth usage; and (3) a manifold-aware latent extrapolation mechanism that exploits temporal redundancy to mask latency. We validate our approach on a cluster of programmable AI accelerators, enabling real-time generation at $720 \times 480$ resolution -- a $50\times$ increase in pixel throughput over prior baselines. Evaluated on both continuous 3D racing and discrete 2D platformer benchmarks, our system delivers fluid 26.4 FPS and 48.3 FPS respectively, with an amortized effective latency of 2.7 ms. This work demonstrates that resolving the ``Memory Wall'' via architectural co-design is not merely an optimization, but a prerequisite for enabling high-fidelity, responsive neural gameplay.
Related papers
- Search Multilayer Perceptron-Based Fusion for Efficient and Accurate Siamese Tracking [3.7727834708902868]
Multilayer Perception (MLP)-based fusion module enables pixel-level interaction with minimal structural overhead.<n>Differentiable neural architecture search (DNAS) to decouple channel-width optimization from other architectural choices.<n> tracker ranks among the top performers on four general-purpose and three aerial benchmarks.
arXiv Detail & Related papers (2026-03-02T10:30:54Z) - AutoNeural: Co-Designing Vision-Language Models for NPU Inference [24.05617280495125]
AutoNeural is an NPU-native VLM architecture co-designed for integer-only inference.<n>We replace the standard ViT encoder with a MobileNetV5-style backbone utilizing depthwise separable convolutions.<n>Our approach delivers substantial efficiency gains, reducing quantization error of vision encoder by up to 7x and end-to-end latency by 14x compared to conventional baselines.
arXiv Detail & Related papers (2025-12-02T16:45:25Z) - BasicAVSR: Arbitrary-Scale Video Super-Resolution via Image Priors and Enhanced Motion Compensation [70.27358326228399]
We propose a BasicAVSR for Arbitrary-scale video super-resolution (AVSR)<n>AVSR aims to enhance the resolution of video frames, potentially various scaling factors.<n>We show that BasicAVSR significantly outperforms existing methods in terms of super-resolution quality, generalization ability, and inference speed.
arXiv Detail & Related papers (2025-10-30T05:08:45Z) - MOBIUS: Big-to-Mobile Universal Instance Segmentation via Multi-modal Bottleneck Fusion and Calibrated Decoder Pruning [91.90342432541138]
Scaling up model size and training data has advanced foundation models for instance-level perception.<n>High computational cost limits adoption on resource-constrained platforms.<n>We introduce a new benchmark for efficient segmentation on both high-performance computing platforms and mobile devices.
arXiv Detail & Related papers (2025-10-16T18:00:00Z) - Accelerating 3D Gaussian Splatting with Neural Sorting and Axis-Oriented Rasterization [14.87046071090259]
3D Gaussian Splatting (3DGS) has recently gained significant attention for high-quality and efficient view synthesis.<n>Despite its impressive algorithmic performance, real-time rendering on resource-constrained devices remains a major challenge due to tight power and area budgets.
arXiv Detail & Related papers (2025-06-08T10:14:54Z) - iFlame: Interleaving Full and Linear Attention for Efficient Mesh Generation [49.8026360054331]
iFlame is a novel transformer-based network architecture for mesh generation.<n>We propose an interleaving autoregressive mesh generation framework that combines the efficiency of linear attention with the expressive power of full attention mechanisms.<n>Our results indicate that the proposed interleaving framework effectively balances computational efficiency and generative performance.
arXiv Detail & Related papers (2025-03-20T19:10:37Z) - Accelerating Linear Recurrent Neural Networks for the Edge with Unstructured Sparsity [39.483346492111515]
Linear recurrent neural networks enable powerful long-range sequence modeling with constant memory usage and time-per-token during inference.<n>Unstructured sparsity offers a compelling solution, enabling substantial reductions in compute and memory requirements when accelerated by compatible hardware platforms.<n>We find that highly sparse linear RNNs consistently achieve better efficiency-performance trade-offs than dense baselines.
arXiv Detail & Related papers (2025-02-03T13:09:21Z) - FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency.
We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs)
We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z) - Resistive Memory-based Neural Differential Equation Solver for Score-based Diffusion Model [55.116403765330084]
Current AIGC methods, such as score-based diffusion, are still deficient in terms of rapidity and efficiency.
We propose a time-continuous and analog in-memory neural differential equation solver for score-based diffusion.
We experimentally validate our solution with 180 nm resistive memory in-memory computing macros.
arXiv Detail & Related papers (2024-04-08T16:34:35Z) - Neuromorphic quadratic programming for efficient and scalable model predictive control [0.31457219084519]
Event-based and memory-integrated neuromorphic architectures promise to solve large optimization problems.
We present a method to solve convex continuous optimization problems with quadratic cost functions and linear constraints on Intel's scalable neuromorphic research chip Loihi 2.
arXiv Detail & Related papers (2024-01-26T14:12:35Z) - RAVEN: Rethinking Adversarial Video Generation with Efficient Tri-plane Networks [93.18404922542702]
We present a novel video generative model designed to address long-term spatial and temporal dependencies.
Our approach incorporates a hybrid explicit-implicit tri-plane representation inspired by 3D-aware generative frameworks.
Our model synthesizes high-fidelity video clips at a resolution of $256times256$ pixels, with durations extending to more than $5$ seconds at a frame rate of 30 fps.
arXiv Detail & Related papers (2024-01-11T16:48:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.