How Small Can 6G Reason? Scaling Tiny Language Models for AI-Native Networks
- URL: http://arxiv.org/abs/2603.02156v1
- Date: Mon, 02 Mar 2026 18:19:49 GMT
- Title: How Small Can 6G Reason? Scaling Tiny Language Models for AI-Native Networks
- Authors: Mohamed Amine Ferrag, Abderrahmane Lakas, Merouane Debbah,
- Abstract summary: We study the scaling behavior and deployment efficiency of compact language models for network-level semantic reasoning in AI-native 6G systems.<n>We evaluate models ranging from 135M (SmolLM2-135M) to 7B parameters (Qwen2.5-7B), including mid-scale architectures such as Llama-3.2-1B, Granite-1B, and Qwen2.5-3B.
- Score: 3.099103925863002
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Emerging 6G visions, reflected in ongoing standardization efforts within 3GPP, IETF, ETSI, ITU-T, and the O-RAN Alliance, increasingly characterize networks as AI-native systems in which high-level semantic reasoning layers operate above standardized control and data-plane functions. Although frontier-scale large language models (LLMs) such as Qwen2.5-7B and Olmo-3-7B demonstrate strong reasoning capability, their computational footprint limits deployment in latency-sensitive, edge-native infrastructures. This paper presents a systematic empirical study of the scaling behavior and deployment efficiency of compact language models for network-level semantic reasoning in AI-native 6G systems. Using 6G-Bench, a standardization-aligned benchmark comprising 30 decision-making tasks across five capability domains, we evaluate models ranging from 135M (SmolLM2-135M) to 7B parameters (Qwen2.5-7B), including mid-scale architectures such as Llama-3.2-1B, Granite-1B, and Qwen2.5-3B. Deterministic accuracy (pass@1) increases from 0.224 at 135M to 0.707 at 7B, but scaling gains are highly non-uniform. A pronounced stability transition occurs in the 1 to 1.5B range, where accuracy rises from 0.373 (Llama-3.2-1B) to 0.531 (Qwen2.5-1.5B) and the instability gap Delta_5 contracts from 0.356 to 0.138. Beyond 3B parameters, improvements diminish (+0.064 from 3B to 7B). Through single-query inference profiling and an Edge Score metric that normalizes accuracy by latency and memory footprint, we show that semantic reliability per unit edge resource does not scale monotonically with parameter count. Instead, mid-scale models (approximately 1.5 to 3B) achieve the most favorable balance between deterministic stability and computational efficiency, providing deployment-relevant guidance for AI-native 6G architectures. All scripts and results are publicly available at https://github.com/maferrag/6G-Bench
Related papers
- Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters [169.7981969517903]
Step 3.5 Flash bridges frontier-level agentic intelligence and computational efficiency.<n>We focus on what matters most when building agents: sharp reasoning and fast, reliable execution.
arXiv Detail & Related papers (2026-02-11T07:53:51Z) - 6G-Bench: An Open Benchmark for Semantic Communication and Network-Level Reasoning with Foundation Models in AI-Native 6G Networks [3.099103925863002]
6G-Bench is an open benchmark for evaluating semantic communication and network-level reasoning in AI-native 6G networks.<n>We generate a balanced pool of 10,000 very-hard multiple-choice questions using task-conditioned prompts.<n>We evaluate 22 foundation models spanning dense and mixture-of-experts architectures, short-context and long-context designs.
arXiv Detail & Related papers (2026-02-09T13:57:37Z) - Qwen3-ASR Technical Report [71.87071808763484]
We introduce Qwen3-ASR family, which includes two powerful all-in-one speech recognition models and a novel non-autoregressive speech forced alignment model.<n>Qwen3-ASR-1.7B and Qwen3-ASR-0.6B are ASR models that support language identification and ASR for 52 languages and dialects.
arXiv Detail & Related papers (2026-01-29T06:58:13Z) - Towards a Science of Scaling Agent Systems [79.64446272302287]
We formalize a definition for agent evaluation and characterize scaling laws as the interplay between agent quantity, coordination structure, modelic, and task properties.<n>We derive a predictive model using coordination metrics, that cross-validated R2=0, enabling prediction on unseen task domains.<n>We identify three effects: (1) a tool-coordination trade-off: under fixed computational budgets, tool-heavy tasks suffer disproportionately from multi-agent overhead, and (2) a capability saturation: coordination yields diminishing or negative returns once single-agent baselines exceed 45%.
arXiv Detail & Related papers (2025-12-09T06:52:21Z) - Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B [12.229008422568192]
This report introduces VibeThinker-1.5B, a 1.5B- parameter dense model developed via our Spectrum-to-Signal Principle (SSP)<n>With a total training cost of only $7,800, VibeThinker-1.5B demonstrates superior reasoning capabilities compared to closed-source models.<n>Remarkably, it surpasses the 400x larger DeepSeek R1 on three math benchmarks.
arXiv Detail & Related papers (2025-11-09T04:37:36Z) - Adaptive Monitoring and Real-World Evaluation of Agentic AI Systems [3.215065407261898]
Multi-agent systems that combine large language models with external tools are rapidly transitioning from research laboratories into high-stakes domains.<n>This "Advanced" sequel fills that gap by providing an algorithmic instantiation or empirical evidence.<n>AMDM cuts anomaly-detection latency from 12.3 s to 5.6 s on simulated goal drift and reduces false-positive rates from 4.5% to 0.9%.
arXiv Detail & Related papers (2025-08-28T15:52:49Z) - LLM-Based Emulation of the Radio Resource Control Layer: Towards AI-Native RAN Protocols [28.04609776570199]
Large AI Models (LAMs) are key enablers of the AI-Native Air Interface (AI-AI)<n>This paper presents the first standards-compliant emulation of the Radio Resource Control layer using a decoder-only LAM.<n>Results demonstrate that LAMs, when augmented with protocol-aware reasoning, can directly orchestrate control-plane procedures.
arXiv Detail & Related papers (2025-05-22T15:55:56Z) - Learning Adaptive Parallel Reasoning with Language Models [70.1745752819628]
We propose Adaptive Parallel Reasoning (APR), a novel reasoning framework that enables language models to orchestrate both serialized and parallel computations end-to-end.<n> APR generalizes existing reasoning methods by enabling adaptive multi-threaded inference using spawn() and join() operations.<n>A key innovation is our end-to-end reinforcement learning strategy, optimizing both parent and child inference threads to enhance task success rate without requiring predefined reasoning structures.
arXiv Detail & Related papers (2025-04-21T22:29:02Z) - S*: Test Time Scaling for Code Generation [55.11863577956177]
We propose S*, the first hybrid test-time scaling framework for code generation.<n>S* substantially improves the coverage and selection accuracy of generated code.
arXiv Detail & Related papers (2025-02-20T09:18:53Z) - SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer [49.1761733723771]
This paper presents SANA-1.5, a linear Diffusion Transformer for efficient scaling in text-to-image generation.<n>We introduce three key innovations: Efficient Training Scaling, Model Depth Pruning, and Inference-time Scaling.<n>Through these strategies, SANA-1.5 achieves a text computation-image alignment score of 0.81 on GenEval, which can be further improved to 0.96 through inference scaling with VILA-Judge.
arXiv Detail & Related papers (2025-01-30T15:31:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.