Related papers: Task-Oriented Multimodal Token Transmission in Resource-Constrained Multiuser Networks

Task-Oriented Multimodal Token Transmission in Resource-Constrained Multiuser Networks

URL: http://arxiv.org/abs/2505.07841v3
Date: Mon, 03 Nov 2025 13:36:27 GMT
Title: Task-Oriented Multimodal Token Transmission in Resource-Constrained Multiuser Networks
Authors: Junhe Zhang, Wanli Ni, Pengwei Wang, Dongyu Wang,
Abstract summary: We propose a task-oriented multimodal token transmission scheme for efficient multimodal information fusion and utilization.<n>To improve the efficiency of token transmission, we design a two-stage training algotithm, including cross-modal alignment and task-oriented fine-tuning.<n>We jointly optimize bandwidth, power allocation, and token length across users by using an alternating optimization method.
Score: 19.42660454288912
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the emergence of large model-based agents, widely adopted transformer-based architectures inevitably produce excessively long token embeddings for transmission, which may result in high bandwidth overhead, increased power consumption and latency. In this letter, we propose a task-oriented multimodal token transmission scheme for efficient multimodal information fusion and utilization. To improve the efficiency of token transmission, we design a two-stage training algotithm, including cross-modal alignment and task-oriented fine-tuning, for large model-based token communication. Meanwhile, token compression is performed using a sliding window pooling operation to save communication resources. To balance the trade-off between latency and model performance caused by compression, we formulate a weighted-sum optimization problem over latency and validation loss. We jointly optimizes bandwidth, power allocation, and token length across users by using an alternating optimization method. Simulation results demonstrate that the proposed algorithm outperforms the baseline under different bandwidth and power budgets. Moreover, the two-stage training algorithm achieves higher accuracy across various signal-to-noise ratios than the method without cross-modal alignment.

Related papers

Orchestrating Multimodal DNN Workloads in Wireless Neural Processing [57.510786937781866]
In edge inference, wireless resource allocation and accelerator deep neural computation (DNN) scheduling have yet to be co-optimized in an end-to-end manner.<n>This paper investigates a paradigm that integrates wireless transmission and multi-core execution into a unified end-to-end pipeline.
arXiv Detail & Related papers (2026-03-02T17:25:43Z)
Hierarchical Online-Scheduling for Energy-Efficient Split Inference with Progressive Transmission [23.81409473238433]
Device-edge collaborative inference with Deep Neural Networks (DNNs) faces fundamental trade-offs among accuracy, latency and energy consumption.<n>This paper proposes a novel ENergy-ACcuracy Hierarchical optimization framework for split Inference, named ENACHI.<n> Experiments on ImageNet dataset demonstrate that ENACHI outperforms state-of-the-art benchmarks under varying deadlines and bandwidths.
arXiv Detail & Related papers (2026-01-13T01:56:46Z)
AoI-Aware Task Offloading and Transmission Optimization for Industrial IoT Networks: A Branching Deep Reinforcement Learning Approach [43.261887758877386]
In the Industrial Internet of Things (IIoT), the frequent transmission of large amounts of data over wireless networks should meet the stringent timeliness requirements.<n>We propose an age-of-information (AoI)-aware multi-base station (BS) real-time monitoring framework to support extensive IIoT deployments.
arXiv Detail & Related papers (2025-10-18T09:14:39Z)
Joint Channel Estimation and Computation Offloading in Fluid Antenna-assisted MEC Networks [81.36647816787713]
We propose an FA-assisted offloading framework to minimize the delay of channel estimation.<n>We show that the proposed system significantly reduces the accuracy under efficient communication.
arXiv Detail & Related papers (2025-09-16T08:48:44Z)
Adaptive Token Merging for Efficient Transformer Semantic Communication at the Edge [28.969380251735924]
Large-scale transformers are central to modern semantic communication, yet their high computational and communication costs hinder deployment on resource-constrained edge devices.<n>This paper introduces a training-free framework for adaptive token merging, a novel mechanism that compresses transformer representations at runtime.<n>Our approach couples merging directly to input redundancy, enabling data-dependent adaptation that balances efficiency and task relevance without retraining.
arXiv Detail & Related papers (2025-09-12T04:11:59Z)
Adaptive Pareto-Optimal Token Merging for Edge Transformer Models in Semantic Communication [27.78647101651565]
Large-scale transformer models have emerged as a powerful tool for semantic communication systems.<n>We present a training-free framework for adaptive token merging in pretrained vision transformers to jointly reduce inference time and transmission resource usage.
arXiv Detail & Related papers (2025-09-11T06:05:35Z)
FindRec: Stein-Guided Entropic Flow for Multi-Modal Sequential Recommendation [50.438552588818]
We propose textbfFindRec (textbfFlexible unified textbfinformation textbfdisentanglement for multi-modal sequential textbfRecommendation)<n>A Stein kernel-based Integrated Information Coordination Module (IICM) theoretically guarantees distribution consistency between multimodal features and ID streams.<n>A cross-modal expert routing mechanism that adaptively filters and combines multimodal features based on their contextual relevance.
arXiv Detail & Related papers (2025-07-07T04:09:45Z)
Token Communication in the Era of Large Models: An Information Bottleneck-Based Approach [55.861432910722186]
UniToCom is a unified token communication paradigm that treats tokens as the fundamental units for both processing and wireless transmission.<n>We propose a generative information bottleneck (GenIB) principle, which facilitates the learning of tokens that preserve essential information.<n>We employ a causal Transformer-based multimodal large language model (MLLM) at the receiver to unify the processing of both discrete and continuous tokens.
arXiv Detail & Related papers (2025-07-02T14:03:01Z)
A Transfer Learning Framework for Multilayer Networks via Model Averaging [8.27209166988677]
Link prediction in multilayer networks is a key challenge in applications such as recommendation systems and protein-protein interaction prediction.<n>We propose a novel transfer learning framework for multilayer networks using a bi-level model averaging method.
arXiv Detail & Related papers (2025-06-14T11:32:31Z)
Modeling and Performance Analysis for Semantic Communications Based on Empirical Results [53.805458017074294]
We propose an Alpha-Beta-Gamma (ABG) formula to model the relationship between the end-to-end measurement and SNR.<n>For image reconstruction tasks, the proposed ABG formula can well fit the commonly used DL networks, such as SCUNet, and Vision Transformer.<n>To the best of our knowledge, this is the first theoretical expression between end-to-end performance metrics and SNR for semantic communications.
arXiv Detail & Related papers (2025-04-29T06:07:50Z)
InfoMAE: Pair-Efficient Cross-Modal Alignment for Multimodal Time-Series Sensing Signals [9.648001493025204]
InfoMAE is a cross-modal alignment framework that tackles the challenge of multimodal pair efficiency under the SSL setting.<n>It enhances downstream multimodal tasks by over 60% with significantly improved multimodal pairing efficiency.<n>It also improves unimodal task accuracy by an average of 22%.
arXiv Detail & Related papers (2025-04-13T20:03:29Z)
Task-Oriented Feature Compression for Multimodal Understanding via Device-Edge Co-Inference [49.77734021302196]
We propose a task-oriented feature compression (TOFC) method for multimodal understanding in a device-edge co-inference framework.<n>To enhance compression efficiency, multiple entropy models are adaptively selected based on the characteristics of the visual features.<n>Results show that TOFC achieves up to 60% reduction in data transmission overhead and 50% reduction in system latency.
arXiv Detail & Related papers (2025-03-17T08:37:22Z)
SIMAC: A Semantic-Driven Integrated Multimodal Sensing And Communication Framework [22.924064428134507]
Single-modality sensing faces limitations in accuracy and capability, and its decoupled implementation with communication systems increases latency.<n>We propose a semantic-driven integrated multimodal sensing and communication framework to overcome these challenges.
arXiv Detail & Related papers (2025-03-11T01:04:42Z)
Take What You Need: Flexible Multi-Task Semantic Communications with Channel Adaptation [51.53221300103261]
This article introduces a novel channel-adaptive and multi-task-aware semantic communication framework based on a masked auto-encoder architecture.<n>A channel-aware extractor is employed to dynamically select relevant information in response to real-time channel conditions.<n> Experimental results demonstrate the superior performance of our framework compared to conventional methods in tasks such as image reconstruction and object detection.
arXiv Detail & Related papers (2025-02-12T09:01:25Z)
R-MTLLMF: Resilient Multi-Task Large Language Model Fusion at the Wireless Edge [78.26352952957909]
Multi-task large language models (MTLLMs) are important for many applications at the wireless edge, where users demand specialized models to handle multiple tasks efficiently.<n>The concept of model fusion via task vectors has emerged as an efficient approach for combining fine-tuning parameters to produce an MTLLM.<n>In this paper, the problem of enabling edge users to collaboratively craft such MTLMs via tasks vectors is studied, under the assumption of worst-case adversarial attacks.
arXiv Detail & Related papers (2024-11-27T10:57:06Z)
FedMFS: Federated Multimodal Fusion Learning with Selective Modality Communication [11.254610576923204]
We propose Federated Multimodal Fusion learning with Selective modality communication (FedMFS) Key idea is the introduction of a modality selection criterion for each device, which weighs (i) the impact of the modality, gauged by Shapley value analysis, against (ii) the modality model size as a gauge for communication overhead. Experiments on the real-world ActionSense dataset demonstrate the ability of FedMFS to achieve comparable accuracy to several baselines while reducing the communication overhead by over 4x.
arXiv Detail & Related papers (2023-10-10T22:23:27Z)
Large AI Model Empowered Multimodal Semantic Communications [48.73159237649128]
We propose a Large AI Model-based Multimodal SC (LAMMSC) framework. We first present the Conditional-based Multimodal Alignment (MMA) that enables the transformation between multimodal and unimodal data. Then, a personalized LLM-based Knowledge Base (LKB) is proposed, which allows users to perform personalized semantic extraction or recovery. Finally, we apply the Generative adversarial network-based channel Estimation (CGE) for estimating the wireless channel state information.
arXiv Detail & Related papers (2023-09-03T19:24:34Z)
Proximal Policy Optimization-based Transmit Beamforming and Phase-shift Design in an IRS-aided ISAC System for the THz Band [90.45915557253385]
IRS-aided integrated sensing and communications (ISAC) system operating in the terahertz (THz) band is proposed to maximize the system capacity. Transmit beamforming and phase-shift design are transformed into a universal optimization problem with ergodic constraints.
arXiv Detail & Related papers (2022-03-21T09:15:18Z)
Low-Latency Federated Learning over Wireless Channels with Differential Privacy [142.5983499872664]
In federated learning (FL), model training is distributed over clients and local models are aggregated by a central server. In this paper, we aim to minimize FL training delay over wireless channels, constrained by overall training performance as well as each client's differential privacy (DP) requirement.
arXiv Detail & Related papers (2021-06-20T13:51:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.