iFlame: Interleaving Full and Linear Attention for Efficient Mesh Generation
- URL: http://arxiv.org/abs/2503.16653v2
- Date: Mon, 24 Mar 2025 03:18:49 GMT
- Title: iFlame: Interleaving Full and Linear Attention for Efficient Mesh Generation
- Authors: Hanxiao Wang, Biao Zhang, Weize Quan, Dong-Ming Yan, Peter Wonka,
- Abstract summary: iFlame is a novel transformer-based network architecture for mesh generation.<n>We propose an interleaving autoregressive mesh generation framework that combines the efficiency of linear attention with the expressive power of full attention mechanisms.<n>Our results indicate that the proposed interleaving framework effectively balances computational efficiency and generative performance.
- Score: 49.8026360054331
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper propose iFlame, a novel transformer-based network architecture for mesh generation. While attention-based models have demonstrated remarkable performance in mesh generation, their quadratic computational complexity limits scalability, particularly for high-resolution 3D data. Conversely, linear attention mechanisms offer lower computational costs but often struggle to capture long-range dependencies, resulting in suboptimal outcomes. To address this trade-off, we propose an interleaving autoregressive mesh generation framework that combines the efficiency of linear attention with the expressive power of full attention mechanisms. To further enhance efficiency and leverage the inherent structure of mesh representations, we integrate this interleaving approach into an hourglass architecture, which significantly boosts efficiency. Our approach reduces training time while achieving performance comparable to pure attention-based models. To improve inference efficiency, we implemented a caching algorithm that almost doubles the speed and reduces the KV cache size by seven-eighths compared to the original Transformer. We evaluate our framework on ShapeNet and Objaverse, demonstrating its ability to generate high-quality 3D meshes efficiently. Our results indicate that the proposed interleaving framework effectively balances computational efficiency and generative performance, making it a practical solution for mesh generation. The training takes only 2 days with 4 GPUs on 39k data with a maximum of 4k faces on Objaverse.
Related papers
- Accelerating Linear Recurrent Neural Networks for the Edge with Unstructured Sparsity [39.483346492111515]
Linear recurrent neural networks enable powerful long-range sequence modeling with constant memory usage and time-per-token during inference.<n>Unstructured sparsity offers a compelling solution, enabling substantial reductions in compute and memory requirements when accelerated by compatible hardware platforms.<n>We find that highly sparse linear RNNs consistently achieve better efficiency-performance trade-offs than dense baselines.
arXiv Detail & Related papers (2025-02-03T13:09:21Z) - Numerical Pruning for Efficient Autoregressive Models [87.56342118369123]
This paper focuses on compressing decoder-only transformer-based autoregressive models through structural weight pruning.
Specifically, we propose a training-free pruning method that calculates a numerical score with Newton's method for the Attention and modules, respectively.
To verify the effectiveness of our method, we provide both theoretical support and extensive experiments.
arXiv Detail & Related papers (2024-12-17T01:09:23Z) - Efficient Federated Learning Using Dynamic Update and Adaptive Pruning with Momentum on Shared Server Data [59.6985168241067]
Federated Learning (FL) encounters two important problems, i.e., low training efficiency and limited computational resources.
We propose a new FL framework, FedDUMAP, to leverage the shared insensitive data on the server and the distributed data in edge devices.
Our proposed FL model, FedDUMAP, combines the three original techniques and has a significantly better performance compared with baseline approaches.
arXiv Detail & Related papers (2024-08-11T02:59:11Z) - Faster Image2Video Generation: A Closer Look at CLIP Image Embedding's Impact on Spatio-Temporal Cross-Attentions [27.111140222002653]
This paper investigates the role of CLIP image embeddings within the Stable Video Diffusion (SVD) framework.
We introduce the VCUT, a training-free approach optimized for efficiency within the SVD architecture.
The implementation of VCUT leads to a reduction of up to 322T Multiple-Accumulate Operations (MACs) per video and a decrease in model parameters by up to 50M, achieving a 20% reduction in latency compared to the baseline.
arXiv Detail & Related papers (2024-07-27T08:21:14Z) - ReduceFormer: Attention with Tensor Reduction by Summation [4.985969607297595]
We introduce ReduceFormer, a family of models optimized for efficiency with the spirit of attention.
ReduceFormer leverages only simple operations such as reduction and element-wise multiplication, leading to greatly simplified architecture and improved inference performance.
The proposed model family is suitable for edge devices where compute resource and memory bandwidth are limited, as well as for cloud computing where high throughput is sought after.
arXiv Detail & Related papers (2024-06-11T17:28:09Z) - HyCubE: Efficient Knowledge Hypergraph 3D Circular Convolutional Embedding [21.479738859698344]
It is desirable and challenging for knowledge hypergraph embedding to reach a trade-off between model effectiveness and efficiency.
We propose an end-to-end efficient knowledge hypergraph embedding model, HyCubE, which designs a novel 3D circular convolutional neural network.
Our proposed model consistently outperforms state-of-the-art baselines, with an average improvement of 8.22% and a maximum improvement of 33.82%.
arXiv Detail & Related papers (2024-02-14T06:05:37Z) - RAVEN: Rethinking Adversarial Video Generation with Efficient Tri-plane Networks [93.18404922542702]
We present a novel video generative model designed to address long-term spatial and temporal dependencies.
Our approach incorporates a hybrid explicit-implicit tri-plane representation inspired by 3D-aware generative frameworks.
Our model synthesizes high-fidelity video clips at a resolution of $256times256$ pixels, with durations extending to more than $5$ seconds at a frame rate of 30 fps.
arXiv Detail & Related papers (2024-01-11T16:48:44Z) - Point Transformer V3: Simpler, Faster, Stronger [88.80496333515325]
This paper focuses on overcoming the existing trade-offs between accuracy and efficiency within the context of point cloud processing.
We present Point Transformer V3 (PTv3), which prioritizes simplicity and efficiency over the accuracy of certain mechanisms.
PTv3 attains state-of-the-art results on over 20 downstream tasks that span both indoor and outdoor scenarios.
arXiv Detail & Related papers (2023-12-15T18:59:59Z) - UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation [93.88170217725805]
We propose a 3D medical image segmentation approach, named UNETR++, that offers both high-quality segmentation masks as well as efficiency in terms of parameters, compute cost, and inference speed.
The core of our design is the introduction of a novel efficient paired attention (EPA) block that efficiently learns spatial and channel-wise discriminative features.
Our evaluations on five benchmarks, Synapse, BTCV, ACDC, BRaTs, and Decathlon-Lung, reveal the effectiveness of our contributions in terms of both efficiency and accuracy.
arXiv Detail & Related papers (2022-12-08T18:59:57Z) - Model-Architecture Co-Design for High Performance Temporal GNN Inference
on FPGA [5.575293536755127]
Real-world applications require high performance inference on real-time streaming dynamic graphs.
We present a novel model-architecture co-design for inference in memory-based TGNNs on FPGAs.
We train our simplified models using knowledge distillation to ensure similar accuracy vis-'a-vis the original model.
arXiv Detail & Related papers (2022-03-10T00:24:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.