Related papers: IIET: Efficient Numerical Transformer via Implicit Iterative Euler Method

IIET: Efficient Numerical Transformer via Implicit Iterative Euler Method

URL: http://arxiv.org/abs/2509.22463v2
Date: Sat, 11 Oct 2025 04:25:18 GMT
Title: IIET: Efficient Numerical Transformer via Implicit Iterative Euler Method
Authors: Xinyu Liu, Bei Li, Jiahao Liu, Junhao Ruan, Kechen Jiao, Hongyin Tang, Jingang Wang, Xiao Tong, Jingbo Zhu,
Abstract summary: Iterative Implicit Euler Transformer (IIET)<n>IIAD allows users to effectively balance the performance-efficiency trade-off.<n>E-IIET variant achieves an average performance gain exceeding 1.6% over vanilla Transformer with comparable speed.
Score: 59.02943805284446
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: High-order numerical methods enhance Transformer performance in tasks like NLP and CV, but introduce a performance-efficiency trade-off due to increased computational overhead. Our analysis reveals that conventional efficiency techniques, such as distillation, can be detrimental to the performance of these models, exemplified by PCformer. To explore more optimizable ODE-based Transformer architectures, we propose the Iterative Implicit Euler Transformer (IIET), which simplifies high-order methods using an iterative implicit Euler approach. This simplification not only leads to superior performance but also facilitates model compression compared to PCformer. To enhance inference efficiency, we introduce Iteration Influence-Aware Distillation (IIAD). Through a flexible threshold, IIAD allows users to effectively balance the performance-efficiency trade-off. On lm-evaluation-harness, IIET boosts average accuracy by 2.65% over vanilla Transformers and 0.8% over PCformer. Its efficient variant, E-IIET, significantly cuts inference overhead by 55% while retaining 99.4% of the original task accuracy. Moreover, the most efficient IIET variant achieves an average performance gain exceeding 1.6% over vanilla Transformer with comparable speed.

Related papers

ADEPT: Adaptive Dynamic Early-Exit Process for Transformers [12.23755727319088]
Early-exit strategies have proven effective in reducing computational demands by halting inference earlier.<n>We introduce ADEPT, a novel approach designed to overcome this issue and enable dynamic early exit in both the prefill and generation phases.<n>We show that ADEPT improves efficiency by up to 25% in language generation tasks and achieves a 4x speed-up in downstream classification tasks, with up to a 45% improvement in performance.
arXiv Detail & Related papers (2026-01-07T08:34:41Z)
Neutralizing Token Aggregation via Information Augmentation for Efficient Test-Time Adaptation [59.1067331268383]
Test-Time Adaptation (TTA) has emerged as an effective solution for adapting Vision Transformers (ViT) to distribution shifts without additional training data.<n>To reduce inference cost, plug-and-play token aggregation methods merge redundant tokens in ViTs to reduce total processed tokens.<n>We formalize this problem as Efficient Test-Time Adaptation (ETTA), seeking to preserve the adaptation capability of TTA while reducing inference latency.
arXiv Detail & Related papers (2025-08-05T12:40:55Z)
Accelerating Diffusion Transformer via Increment-Calibrated Caching with Channel-Aware Singular Value Decomposition [4.0594792247165]
Diffusion transformer (DiT) models have achieved remarkable success in image generation.<n>We propose increment-calibrated caching, a training-free method for DiT acceleration.<n>Our method eliminates more than 45% and improves IS by 12 at the cost of less than 0.06 FID increase.
arXiv Detail & Related papers (2025-05-09T06:56:17Z)
A Simple and Effective Reinforcement Learning Method for Text-to-Image Diffusion Fine-tuning [61.403275660120606]
Reinforcement learning (RL)-based fine-tuning has emerged as a powerful approach for aligning diffusion models with black-box objectives.<n>We propose leave-one-out PPO (LOOP), a novel RL for diffusion fine-tuning method.<n>Our results demonstrate that LOOP effectively improves diffusion models on various black-box objectives, and achieves a better balance between computational efficiency and performance.
arXiv Detail & Related papers (2025-03-02T13:43:53Z)
Transformers with Joint Tokens and Local-Global Attention for Efficient Human Pose Estimation [34.99437411281915]
This paper proposes two ViT-based models for accurate, efficient, and robust 2D pose estimation.<n> Experiments on six benchmarks demonstrate that the proposed methods significantly outperform state-of-the-art methods.
arXiv Detail & Related papers (2025-02-28T22:34:22Z)
Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment [81.84950252537618]
This paper reveals a unified game-theoretic connection between iterative BOND and self-play alignment.<n>We establish a novel framework, WIN rate Dominance (WIND), with a series of efficient algorithms for regularized win rate dominance optimization.
arXiv Detail & Related papers (2024-10-28T04:47:39Z)
Accelerating AI Performance using Anderson Extrapolation on GPUs [2.114333871769023]
We present a novel approach for accelerating AI performance by leveraging Anderson extrapolation.<n>By identifying the crossover point where a mixing penalty is incurred, the method focuses on reducing iterations to convergence.<n>We demonstrate significant improvements in both training and inference, motivated by scalability and efficiency extensions to the realm of high-performance computing.
arXiv Detail & Related papers (2024-10-25T10:45:17Z)
LATTE: Low-Precision Approximate Attention with Head-wise Trainable Threshold for Efficient Transformer [0.0]
We propose Low-Precision Approximate Attention with Head-wise Trainable Threshold for Efficient Transformer (LATTE) LATTE employs a headwise threshold-based filter with the low-precision dot product to reduce the computation of Multi-Head Attention (MHA) Experimental results indicate LATTE can smoothly adapt to both NLP and CV tasks, offering significant computation savings.
arXiv Detail & Related papers (2024-04-11T07:23:19Z)
Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation [67.13876021157887]
Dynamic Tuning (DyT) is a novel approach to improve both parameter and inference efficiency for ViT adaptation. DyT achieves superior performance compared to existing PEFT methods while evoking only 71% of their FLOPs on the VTAB-1K benchmark.
arXiv Detail & Related papers (2024-03-18T14:05:52Z)
From PEFT to DEFT: Parameter Efficient Finetuning for Reducing Activation Density in Transformers [52.199303258423306]
We propose a novel density loss that encourages higher activation sparsity in pre-trained models. Our proposed method, textbfDEFT, can consistently reduce activation density by up to textbf44.94% on RoBERTa$_mathrmLarge$ and by textbf53.19% (encoder density) and textbf90.60% (decoder density) on Flan-T5$_mathrmXXL$.
arXiv Detail & Related papers (2024-02-02T21:25:46Z)
HEAT: Hardware-Efficient Automatic Tensor Decomposition for Transformer Compression [69.36555801766762]
We propose a hardware-aware tensor decomposition framework, dubbed HEAT, that enables efficient exploration of the exponential space of possible decompositions. We experimentally show that our hardware-aware factorized BERT variants reduce the energy-delay product by 5.7x with less than 1.1% accuracy loss.
arXiv Detail & Related papers (2022-11-30T05:31:45Z)
An Algorithm-Hardware Co-Optimized Framework for Accelerating N:M Sparse Transformers [11.811907838840712]
We propose an algorithm-hardware co-optimized framework to flexibly and efficiently accelerate Transformers by utilizing general N:M sparsity patterns. We present a flexible and efficient hardware architecture, namely STA, to achieve significant speedup when deploying N:M sparse Transformers. Experimental results show that compared to other methods, N:M sparse Transformers, generated using IDP, achieves an average of 6.7% improvement on accuracy with high training efficiency.
arXiv Detail & Related papers (2022-08-12T04:51:49Z)
Efficient Few-Shot Object Detection via Knowledge Inheritance [62.36414544915032]
Few-shot object detection (FSOD) aims at learning a generic detector that can adapt to unseen tasks with scarce training samples. We present an efficient pretrain-transfer framework (PTF) baseline with no computational increment. We also propose an adaptive length re-scaling (ALR) strategy to alleviate the vector length inconsistency between the predicted novel weights and the pretrained base weights.
arXiv Detail & Related papers (2022-03-23T06:24:31Z)
Easy and Efficient Transformer : Scalable Inference Solution For large NLP mode [14.321889138798072]
This paper introduces a series of ultra-large-scale pre-training model optimization methods. An inference engine -- Easy and Efficient Transformer (EET) is proposed. EET achieves a 1.5-15x state-of-art speedup varying with context length.
arXiv Detail & Related papers (2021-04-26T11:00:56Z)
ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning [91.13797346047984]
We introduce ADAHESSIAN, a second order optimization algorithm which dynamically incorporates the curvature of the loss function via ADAptive estimates. We show that ADAHESSIAN achieves new state-of-the-art results by a large margin as compared to other adaptive optimization methods.
arXiv Detail & Related papers (2020-06-01T05:00:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.