Related papers: AI2MMUM: AI-AI Oriented Multi-Modal Universal Model Leveraging Telecom Domain Large Model

AI2MMUM: AI-AI Oriented Multi-Modal Universal Model Leveraging Telecom Domain Large Model

URL: http://arxiv.org/abs/2505.10003v1
Date: Thu, 15 May 2025 06:32:59 GMT
Title: AI2MMUM: AI-AI Oriented Multi-Modal Universal Model Leveraging Telecom Domain Large Model
Authors: Tianyu Jiao, Zhuoran Xiao, Yihang Huang, Chenhui Ye, Yijia Feng, Liyu Cai, Jiang Chang, Fangkun Liu, Yin Xu, Dazhi He, Yunfeng Guan, Wenjun Zhang,
Abstract summary: We propose a scalable, task-aware artificial intelligence-air interface multi-modal universal model (AI2MMUM)<n>To enhance task adaptability, task instructions consist of fixed task keywords and learnable, implicit prefix prompts.<n> lightweight task-specific heads are designed to directly output task objectives.
Score: 8.404195378257178
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Designing a 6G-oriented universal model capable of processing multi-modal data and executing diverse air interface tasks has emerged as a common goal in future wireless systems. Building on our prior work in communication multi-modal alignment and telecom large language model (LLM), we propose a scalable, task-aware artificial intelligence-air interface multi-modal universal model (AI2MMUM), which flexibility and effectively perform various physical layer tasks according to subtle task instructions. The LLM backbone provides robust contextual comprehension and generalization capabilities, while a fine-tuning approach is adopted to incorporate domain-specific knowledge. To enhance task adaptability, task instructions consist of fixed task keywords and learnable, implicit prefix prompts. Frozen radio modality encoders extract universal representations and adapter layers subsequently bridge radio and language modalities. Moreover, lightweight task-specific heads are designed to directly output task objectives. Comprehensive evaluations demonstrate that AI2MMUM achieves SOTA performance across five representative physical environment/wireless channel-based downstream tasks using the WAIR-D and DeepMIMO datasets.

Related papers

OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data Synthesis [70.39500621448383]
Open-world mobile manipulation task remains a challenge due to the need for generalization to open-ended instructions and environments.<n>We propose a novel multi-modal agent architecture that maintains multi-view scene frames and agent states for decision-making and controls the robot by function calling.<n>We highlight our fine-tuned OWMM-VLM as the first dedicated foundation model for mobile manipulators with global scene understanding, robot state tracking, and multi-modal action generation in a unified model.
arXiv Detail & Related papers (2025-06-04T17:57:44Z)
LLM Agents as 6G Orchestrator: A Paradigm for Task-Oriented Physical-Layer Automation [1.128193862264227]
This paper proposes a novel comprehensive approach for building task-oriented 6G LLM agents. We first propose a two-stage continual pre-training and fine-tuning scheme to build the field basic model. Experiment results of exemplary tasks, such as physical-layer task decomposition, show the proposed paradigm's feasibility and effectiveness.
arXiv Detail & Related papers (2024-09-21T05:08:29Z)
WavLLM: Towards Robust and Adaptive Speech Large Language Model [93.0773293897888]
We introduce WavLLM, a robust and adaptive speech large language model with dual encoders, and a prompt-aware LoRA weight adapter. We validate the proposed model on universal speech benchmarks including tasks such as ASR, ST, SV, ER, and also apply it to specialized datasets like Gaokao English listening comprehension set for SQA, and speech Chain-of-Thought (CoT) evaluation set.
arXiv Detail & Related papers (2024-03-31T12:01:32Z)
An Interactive Agent Foundation Model [49.77861810045509]
We propose an Interactive Agent Foundation Model that uses a novel multi-task agent training paradigm for training AI agents. Our training paradigm unifies diverse pre-training strategies, including visual masked auto-encoders, language modeling, and next-action prediction. We demonstrate the performance of our framework across three separate domains -- Robotics, Gaming AI, and Healthcare.
arXiv Detail & Related papers (2024-02-08T18:58:02Z)
Generative Multimodal Models are In-Context Learners [60.50927925426832]
We introduce Emu2, a generative multimodal model with 37 billion parameters, trained on large-scale multimodal sequences. Emu2 exhibits strong multimodal in-context learning abilities, even emerging to solve tasks that require on-the-fly reasoning.
arXiv Detail & Related papers (2023-12-20T18:59:58Z)
Multitask Multimodal Prompted Training for Interactive Embodied Task Completion [48.69347134411864]
Embodied MultiModal Agent (EMMA) is a unified encoder-decoder model that reasons over images and trajectories. By unifying all tasks as text generation, EMMA learns a language of actions which facilitates transfer across tasks.
arXiv Detail & Related papers (2023-11-07T15:27:52Z)
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration [74.31268379055201]
mPLUG-Owl2 is a versatile multi-modal large language model. It effectively leverages modality collaboration to improve performance in both text and multi-modal tasks.
arXiv Detail & Related papers (2023-11-07T14:21:29Z)
Multi-task Hierarchical Adversarial Inverse Reinforcement Learning [40.60364143826424]
Multi-task Imitation Learning (MIL) aims to train a policy capable of performing a distribution of tasks based on multi-task expert demonstrations. Existing MIL algorithms suffer from low data efficiency and poor performance on complex long-horizontal tasks. We develop Multi-task Hierarchical Adversarial Inverse Reinforcement Learning (MH-AIRL) to learn hierarchically-structured multi-task policies.
arXiv Detail & Related papers (2023-05-22T01:58:40Z)
OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models [72.8156832931841]
Generalist models are capable of performing diverse multi-modal tasks in a task-agnostic way within a single model. We release a generalist model learning system, OFASys, built on top of a declarative task interface named multi-modal instruction.
arXiv Detail & Related papers (2022-12-08T17:07:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.