Related papers: Adaptive Token Merging for Efficient Transformer Semantic Communication at the Edge

Adaptive Token Merging for Efficient Transformer Semantic Communication at the Edge

URL: http://arxiv.org/abs/2509.09955v1
Date: Fri, 12 Sep 2025 04:11:59 GMT
Title: Adaptive Token Merging for Efficient Transformer Semantic Communication at the Edge
Authors: Omar Erak, Omar Alhussein, Hatem Abou-Zeid, Mehdi Bennis, Sami Muhaidat,
Abstract summary: Large-scale transformers are central to modern semantic communication, yet their high computational and communication costs hinder deployment on resource-constrained edge devices.<n>This paper introduces a training-free framework for adaptive token merging, a novel mechanism that compresses transformer representations at runtime.<n>Our approach couples merging directly to input redundancy, enabling data-dependent adaptation that balances efficiency and task relevance without retraining.
Score: 28.969380251735924
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large-scale transformers are central to modern semantic communication, yet their high computational and communication costs hinder deployment on resource-constrained edge devices. This paper introduces a training-free framework for adaptive token merging, a novel mechanism that compresses transformer representations at runtime by selectively merging semantically redundant tokens under per-layer similarity thresholds. Unlike prior fixed-ratio reduction, our approach couples merging directly to input redundancy, enabling data-dependent adaptation that balances efficiency and task relevance without retraining. We cast the discovery of merging strategies as a multi-objective optimization problem and leverage Bayesian optimization to obtain Pareto-optimal trade-offs between accuracy, inference cost, and communication cost. On ImageNet classification, we match the accuracy of the unmodified transformer with 30\% fewer floating-point operations per second and under 20\% of the original communication cost, while for visual question answering our method achieves performance competitive with the full LLaVA model at less than one-third of the compute and one-tenth of the bandwidth. Finally, we show that our adaptive merging is robust across varying channel conditions and provides inherent privacy benefits, substantially degrading the efficacy of model inversion attacks. Our framework provides a practical and versatile solution for deploying powerful transformer models in resource-limited edge intelligence scenarios.

Related papers

Enhancing Communication Efficiency in FL with Adaptive Gradient Quantization and Communication Frequency Optimization [0.0]
Federated Learning (FL) enables participant devices to collaboratively train deep learning models without sharing their data with the server or other devices.<n>FL faces a major bottleneck due to high communication overhead from frequent model updates between devices and the server.<n>We propose a three-fold strategy to drop less important features while retaining high-value ones.
arXiv Detail & Related papers (2025-09-27T17:25:44Z)
CollaPipe: Adaptive Segment-Optimized Pipeline Parallelism for Collaborative LLM Training in Heterogeneous Edge Networks [57.95170323315603]
We introduce CollaPipe, a distributed learning framework that integrates collaborative pipeline parallelism with federated aggregation to support self-evolving networks.<n>In CollaPipe, the encoder part is adaptively partitioned into variable-sized segments and deployed across mobile devices for pipeline-parallel training, while the decoder is deployed on edge servers to handle generative tasks.<n>To enhance training efficiency, we formulate a joint optimization problem that adaptively allocates model segments, micro-batches, bandwidth, and transmission power.
arXiv Detail & Related papers (2025-09-24T07:54:01Z)
Adaptive Pareto-Optimal Token Merging for Edge Transformer Models in Semantic Communication [27.78647101651565]
Large-scale transformer models have emerged as a powerful tool for semantic communication systems.<n>We present a training-free framework for adaptive token merging in pretrained vision transformers to jointly reduce inference time and transmission resource usage.
arXiv Detail & Related papers (2025-09-11T06:05:35Z)
CoFormer: Collaborating with Heterogeneous Edge Devices for Scalable Transformer Inference [34.693462786320545]
CoFormer is a collaborative inference system for general transformer models.<n>CoFormer enables the efficient inference of GPT2-XL with 1.6 billion parameters on edge devices, reducing memory requirements by 76.3%.
arXiv Detail & Related papers (2025-08-28T02:50:12Z)
Pieceformer: Similarity-Driven Knowledge Transfer via Scalable Graph Transformer in VLSI [10.727382706747592]
Pieceformer is a scalable, self-supervised similarity assessment framework.<n>It reduces mean absolute error (MAE) by 24.9% over the baseline.<n>It is the only method to correctly cluster all real-world design groups.
arXiv Detail & Related papers (2025-06-18T22:47:09Z)
Optimal Transport Adapter Tuning for Bridging Modality Gaps in Few-Shot Remote Sensing Scene Classification [80.83325513157637]
Few-Shot Remote Sensing Scene Classification (FS-RSSC) presents the challenge of classifying remote sensing images with limited labeled samples.<n>We propose a novel Optimal Transport Adapter Tuning (OTAT) framework aimed at constructing an ideal Platonic representational space.
arXiv Detail & Related papers (2025-03-19T07:04:24Z)
CARE Transformer: Mobile-Friendly Linear Visual Transformer via Decoupled Dual Interaction [77.8576094863446]
We propose a new detextbfCoupled dutextbfAl-interactive lineatextbfR atttextbfEntion (CARE) mechanism. We first propose an asymmetrical feature decoupling strategy that asymmetrically decouples the learning process for local inductive bias and long-range dependencies. By adopting a decoupled learning way and fully exploiting complementarity across features, our method can achieve both high efficiency and accuracy.
arXiv Detail & Related papers (2024-11-25T07:56:13Z)
HAFLQ: Heterogeneous Adaptive Federated LoRA Fine-tuned LLM with Quantization [55.972018549438964]
Federated fine-tuning of pre-trained Large Language Models (LLMs) enables task-specific adaptation across diverse datasets while preserving privacy.<n>We propose HAFLQ (Heterogeneous Adaptive Federated Low-Rank Adaptation Fine-tuned LLM with Quantization), a novel framework for efficient and scalable fine-tuning of LLMs in heterogeneous environments.<n> Experimental results on the text classification task demonstrate that HAFLQ reduces memory usage by 31%, lowers communication cost by 49%, improves accuracy by 50%, and achieves faster convergence compared to the baseline method.
arXiv Detail & Related papers (2024-11-10T19:59:54Z)
The Self-Optimal-Transport Feature Transform [2.804721532913997]
We show how to upgrade the set of features of a data instance to facilitate downstream matching or grouping related tasks. A particular min-cost-max-flow fractional matching problem, whose entropy regularized version can be approximated by an optimal transport (OT) optimization, results in our transductive transform. Empirically, the transform is highly effective and flexible in its use, consistently improving networks it is inserted into.
arXiv Detail & Related papers (2022-04-06T20:00:39Z)
AdaViT: Adaptive Tokens for Efficient Vision Transformer [91.88404546243113]
We introduce AdaViT, a method that adaptively adjusts the inference cost of vision transformer (ViT) for images of different complexity. AdaViT achieves this by automatically reducing the number of tokens in vision transformers that are processed in the network as inference proceeds.
arXiv Detail & Related papers (2021-12-14T18:56:07Z)
AdaViT: Adaptive Vision Transformers for Efficient Image Recognition [78.07924262215181]
We introduce AdaViT, an adaptive framework that learns to derive usage policies on which patches, self-attention heads and transformer blocks to use. Our method obtains more than 2x improvement on efficiency compared to state-of-the-art vision transformers with only 0.8% drop of accuracy.
arXiv Detail & Related papers (2021-11-30T18:57:02Z)
Self-supervised Augmentation Consistency for Adapting Semantic Segmentation [56.91850268635183]
We propose an approach to domain adaptation for semantic segmentation that is both practical and highly accurate. We employ standard data augmentation techniques $-$ photometric noise, flipping and scaling $-$ and ensure consistency of the semantic predictions. We achieve significant improvements of the state-of-the-art segmentation accuracy after adaptation, consistent both across different choices of the backbone architecture and adaptation scenarios.
arXiv Detail & Related papers (2021-04-30T21:32:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.