Related papers: PiERN: Token-Level Routing for Integrating High-Precision Computation and Reasoning

PiERN: Token-Level Routing for Integrating High-Precision Computation and Reasoning

URL: http://arxiv.org/abs/2509.18169v2
Date: Sat, 27 Sep 2025 06:44:30 GMT
Title: PiERN: Token-Level Routing for Integrating High-Precision Computation and Reasoning
Authors: Hengbo Xiao, Jingyuan Fan, Xin Tong, Jingzhao Zhang, Chao Lu, Guannan He,
Abstract summary: We propose Physically Routing-isolated Experts Network (PiERN) for integrating computation and reasoning.<n>PiERN endogenously integrates computational capabilities into neural networks after separately training experts, a text-to-computation module, and a router.<n>Results show that the PiERN architecture achieves higher accuracy than directly finetuning large language models.
Score: 20.622941954258973
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Tasks on complex systems require high-precision numerical computation to support decisions, but current large language models (LLMs) cannot integrate such computations as an intrinsic and interpretable capability with existing architectures. Multi-agent approaches can leverage external experts, but inevitably introduce communication overhead and suffer from inefficiency caused by limited scalability. To this end, we propose Physically-isolated Experts Routing Network (PiERN), an architecture for integrating computation and reasoning. Instead of the tool-use workflows or function-calling, PiERN endogenously integrates computational capabilities into neural networks after separately training experts, a text-to-computation module, and a router. At inference, the router directs computation and reasoning at the token level, thereby enabling iterative alternation within a single chain of thought. We evaluate PiERN on representative linear and nonlinear computation-reasoning tasks against LLM finetuning and the multi-agent system approaches. Results show that the PiERN architecture achieves not only higher accuracy than directly finetuning LLMs but also significant improvements in response latency, token usage, and GPU energy consumption compared with mainstream multi-agent approaches. PiERN offers an efficient, interpretable, and scalable paradigm for interfacing language models with scientific systems.

Related papers

Fourier Neural Operators Explained: A Practical Perspective [75.12291469255794]
The Fourier Neural Operator (FNO) has become the most influential and widely adopted due to its elegant spectral formulation.<n>This guide aims to establish a clear and reliable framework for applying FNOs effectively across diverse scientific and engineering fields.
arXiv Detail & Related papers (2025-12-01T08:56:21Z)
GridMind: LLMs-Powered Agents for Power System Analysis and Operations [3.7568206336846663]
This paper presents a multi-agent AI system that integrates Large Language Models (LLMs) with deterministic engineering solvers to enable conversational scientific computing for power system analysis.<n>GridMind addresses workflow integration, knowledge accessibility, context preservation, and expert decision-support augmentation.<n>This work establishes agentic AI as a viable paradigm for scientific computing, demonstrating how conversational interfaces can enhance accessibility while preserving numerical rigor essential for critical engineering applications.
arXiv Detail & Related papers (2025-09-02T16:42:18Z)
LLMThinkBench: Towards Basic Math Reasoning and Overthinking in Large Language Models [4.370731670976415]
Large Language Models (LLMs) have achieved remarkable performance on complex mathematical benchmarks.<n>LLMs often struggle with simple arithmetic tasks and exhibit a tendency toward over-explaining or "overthinking" answers.<n>The framework provides 14 math tasks with randomized test data generation and robust parsing strategies.<n>Users can extend the tool with custom tasks, reproduce experiments with seeding, and generate detailed efficiency reports.
arXiv Detail & Related papers (2025-07-05T12:31:17Z)
Neural Interpretable PDEs: Harmonizing Fourier Insights with Attention for Scalable and Interpretable Physics Discovery [15.29112632863168]
We introduce Neural Interpretable PDEs (NIPS), a novel neural operator architecture that builds upon and enhances Nonlocal Attention Operators (NAO)<n>NIPS employs a linear attention mechanism to enable scalable learning and integrates a learnable kernel network that acts as a channel-independent convolution in Fourier space.<n> Empirical evaluations demonstrate that NIPS consistently surpasses NAO and other baselines across diverse benchmarks.
arXiv Detail & Related papers (2025-05-29T05:18:30Z)
NNTile: a machine learning framework capable of training extremely large GPT language models on a single node [83.9328245724548]
NNTile is based on a StarPU library, which implements task-based parallelism and schedules all provided tasks onto all available processing units.<n>It means that a particular operation, necessary to train a large neural network, can be performed on any of the CPU cores or GPU devices.
arXiv Detail & Related papers (2025-04-17T16:22:32Z)
High-fidelity Multiphysics Modelling for Rapid Predictions Using Physics-informed Parallel Neural Operator [17.85837423448985]
Modelling complex multiphysics systems governed by nonlinear and strongly coupled partial differential equations (PDEs) is a cornerstone in computational science and engineering.<n>We propose a novel paradigm, physics-informed parallel neural operator (PIPNO), a scalable and unsupervised learning framework.<n>PIPNO efficiently captures nonlinear operator mappings across diverse physics, including geotechnical engineering, material science, electromagnetism, quantum mechanics, and fluid dynamics.
arXiv Detail & Related papers (2025-02-26T20:29:41Z)
DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs [86.76714527437383]
This paper proposes DSMoE, a novel approach that achieves sparsification by partitioning pre-trained FFN layers into computational blocks.<n>We implement adaptive expert routing using sigmoid activation and straight-through estimators, enabling tokens to flexibly access different aspects of model knowledge.<n>Experiments on LLaMA models demonstrate that under equivalent computational constraints, DSMoE achieves superior performance compared to existing pruning and MoE approaches.
arXiv Detail & Related papers (2025-02-18T02:37:26Z)
EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference [49.94169109038806]
This paper introduces EPS-MoE, a novel expert pipeline scheduler for MoE that surpasses the existing parallelism schemes.<n>Our results demonstrate at most 52.4% improvement in prefill throughput compared to existing parallel inference methods.
arXiv Detail & Related papers (2024-10-16T05:17:49Z)
Inducing Point Operator Transformer: A Flexible and Scalable Architecture for Solving PDEs [7.152311859951986]
We introduce an attention-based model called an inducing-point operator transformer (IPOT) IPOT is designed to handle any input function and output query while capturing global interactions in a computationally efficient way. By detaching the inputs/outputs discretizations from the processor with a smaller latent bottleneck, IPOT offers flexibility in processing arbitrary discretizations.
arXiv Detail & Related papers (2023-12-18T06:57:31Z)
A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs) MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines. This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z)
Neuromorphic scaling advantages for energy-efficient random walk computation [0.28144129864580447]
Neuromorphic computing aims to replicate the brain's computational structure and architecture in man-made hardware. We show that high-degree parallelism and configurability of spiking neuromorphic architectures makes them well-suited to implement random walks via discrete time chains. We find that NMC platforms, at a sufficient scale, can drastically reduce the energy demands of high-performance computing platforms.
arXiv Detail & Related papers (2021-07-27T19:44:33Z)
Deep Learning for Ultra-Reliable and Low-Latency Communications in 6G Networks [84.2155885234293]
We first summarize how to apply data-driven supervised deep learning and deep reinforcement learning in URLLC. To address these open problems, we develop a multi-level architecture that enables device intelligence, edge intelligence, and cloud intelligence for URLLC.
arXiv Detail & Related papers (2020-02-22T14:38:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.