Related papers: Adaptive Pareto-Optimal Token Merging for Edge Transformer Models in Semantic Communication

Adaptive Pareto-Optimal Token Merging for Edge Transformer Models in Semantic Communication

URL: http://arxiv.org/abs/2509.09168v1
Date: Thu, 11 Sep 2025 06:05:35 GMT
Title: Adaptive Pareto-Optimal Token Merging for Edge Transformer Models in Semantic Communication
Authors: Omar Erak, Omar Alhussein, Hatem Abou-Zeid, Mehdi Bennis,
Abstract summary: Large-scale transformer models have emerged as a powerful tool for semantic communication systems.<n>We present a training-free framework for adaptive token merging in pretrained vision transformers to jointly reduce inference time and transmission resource usage.
Score: 27.78647101651565
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large-scale transformer models have emerged as a powerful tool for semantic communication systems, enabling edge devices to extract rich representations for robust inference across noisy wireless channels. However, their substantial computational demands remain a major barrier to practical deployment in resource-constrained 6G networks. In this paper, we present a training-free framework for adaptive token merging in pretrained vision transformers to jointly reduce inference time and transmission resource usage. We formulate the selection of per-layer merging proportions as a multi-objective optimization problem to balance accuracy and computational cost. We employ Gaussian process-based Bayesian optimization to construct a Pareto frontier of optimal configurations, enabling flexible runtime adaptation to dynamic application requirements and channel conditions. Extensive experiments demonstrate that our method consistently outperforms other baselines and achieves significant reductions in floating-point operations while maintaining competitive accuracy across a wide range of signal-to-noise ratio (SNR) conditions. Additional results highlight the effectiveness of adaptive policies that adjust merging aggressiveness in response to channel quality, providing a practical mechanism to trade off latency and semantic fidelity on demand. These findings establish a scalable and efficient approach for deploying transformer-based semantic communication in future edge intelligence systems.

Related papers

CollaPipe: Adaptive Segment-Optimized Pipeline Parallelism for Collaborative LLM Training in Heterogeneous Edge Networks [57.95170323315603]
We introduce CollaPipe, a distributed learning framework that integrates collaborative pipeline parallelism with federated aggregation to support self-evolving networks.<n>In CollaPipe, the encoder part is adaptively partitioned into variable-sized segments and deployed across mobile devices for pipeline-parallel training, while the decoder is deployed on edge servers to handle generative tasks.<n>To enhance training efficiency, we formulate a joint optimization problem that adaptively allocates model segments, micro-batches, bandwidth, and transmission power.
arXiv Detail & Related papers (2025-09-24T07:54:01Z)
Adaptive Token Merging for Efficient Transformer Semantic Communication at the Edge [28.969380251735924]
Large-scale transformers are central to modern semantic communication, yet their high computational and communication costs hinder deployment on resource-constrained edge devices.<n>This paper introduces a training-free framework for adaptive token merging, a novel mechanism that compresses transformer representations at runtime.<n>Our approach couples merging directly to input redundancy, enabling data-dependent adaptation that balances efficiency and task relevance without retraining.
arXiv Detail & Related papers (2025-09-12T04:11:59Z)
Adaptive Semantic Token Communication for Transformer-based Edge Inference [15.405730528104113]
This paper presents an adaptive framework for edge inference based on a dynamically transformer-powered deep joint source channel coding architecture.<n>We employ a semantic token selection mechanism that adaptively compresses informative features into a user specified number of tokens per sample.<n>We incorporate a resource allocation algorithm based on Lyapunov optimization to enhance robustness under dynamic network conditions.
arXiv Detail & Related papers (2025-05-23T08:15:05Z)
Tool-Aided Evolutionary LLM for Generative Policy Toward Efficient Resource Management in Wireless Federated Learning [20.07184763454309]
Federated Learning (FL) enables distributed model training across edge devices in a privacy-friendly manner.<n>This paper proposes a Tool-aided Evolutionary Large Language Model (T-ELLM) framework to generate a qualified policy for device selection in a wireless FL environment.
arXiv Detail & Related papers (2025-05-16T10:07:29Z)
The Larger the Merrier? Efficient Large AI Model Inference in Wireless Edge Networks [56.37880529653111]
The demand for large computation model (LAIM) services is driving a paradigm shift from traditional cloud-based inference to edge-based inference for low-latency, privacy-preserving applications.<n>In this paper, we investigate the LAIM-inference scheme, where a pre-trained LAIM is pruned and partitioned into on-device and on-server sub-models for deployment.
arXiv Detail & Related papers (2025-05-14T08:18:55Z)
Efficient Split Federated Learning for Large Language Models over Communication Networks [45.02252893286613]
Fine-tuning pre-trained large language models (LLMs) in a distributed manner poses significant challenges on resource-constrained edge networks.<n>We propose SflLLM, a novel framework that integrates split federated learning with parameter-efficient fine-tuning techniques.<n>By leveraging model splitting and low-rank adaptation (LoRA), SflLLM reduces the computational burden on edge devices.
arXiv Detail & Related papers (2025-04-20T16:16:54Z)
Optimal Transport Adapter Tuning for Bridging Modality Gaps in Few-Shot Remote Sensing Scene Classification [80.83325513157637]
Few-Shot Remote Sensing Scene Classification (FS-RSSC) presents the challenge of classifying remote sensing images with limited labeled samples.<n>We propose a novel Optimal Transport Adapter Tuning (OTAT) framework aimed at constructing an ideal Platonic representational space.
arXiv Detail & Related papers (2025-03-19T07:04:24Z)
Take What You Need: Flexible Multi-Task Semantic Communications with Channel Adaptation [51.53221300103261]
This article introduces a novel channel-adaptive and multi-task-aware semantic communication framework based on a masked auto-encoder architecture.<n>A channel-aware extractor is employed to dynamically select relevant information in response to real-time channel conditions.<n> Experimental results demonstrate the superior performance of our framework compared to conventional methods in tasks such as image reconstruction and object detection.
arXiv Detail & Related papers (2025-02-12T09:01:25Z)
Compressed Regression over Adaptive Networks [58.79251288443156]
We derive the performance achievable by a network of distributed agents that solve, adaptively and in the presence of communication constraints, a regression problem. We devise an optimized allocation strategy where the parameters necessary for the optimization can be learned online by the agents.
arXiv Detail & Related papers (2023-04-07T13:41:08Z)
Cross-receptive Focused Inference Network for Lightweight Image Super-Resolution [64.25751738088015]
Transformer-based methods have shown impressive performance in single image super-resolution (SISR) tasks. Transformers that need to incorporate contextual information to extract features dynamically are neglected. We propose a lightweight Cross-receptive Focused Inference Network (CFIN) that consists of a cascade of CT Blocks mixed with CNN and Transformer.
arXiv Detail & Related papers (2022-07-06T16:32:29Z)
Adaptive Subcarrier, Parameter, and Power Allocation for Partitioned Edge Learning Over Broadband Channels [69.18343801164741]
partitioned edge learning (PARTEL) implements parameter-server training, a well known distributed learning method, in wireless network. We consider the case of deep neural network (DNN) models which can be trained using PARTEL by introducing some auxiliary variables.
arXiv Detail & Related papers (2020-10-08T15:27:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.