Related papers: Rethinking LLM Advancement: Compute-Dependent and Independent Paths to Progress

Rethinking LLM Advancement: Compute-Dependent and Independent Paths to Progress

URL: http://arxiv.org/abs/2505.04075v2
Date: Thu, 05 Jun 2025 17:09:08 GMT
Title: Rethinking LLM Advancement: Compute-Dependent and Independent Paths to Progress
Authors: Jack Sanderson, Teddy Foley, Spencer Guo, Anqi Qu, Henry Josephson,
Abstract summary: This study evaluates whether large language models can advance through algorithmic innovation in compute-constrained environments.<n>We propose a novel framework distinguishing compute-dependent innovations--which yield disproportionate benefits at high compute--from compute-independent innovations.
Score: 10.461430685627857
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Regulatory efforts to govern large language model (LLM) development have predominantly focused on restricting access to high-performance computational resources. This study evaluates the efficacy of such measures by examining whether LLM capabilities can advance through algorithmic innovation in compute-constrained environments. We propose a novel framework distinguishing compute-dependent innovations--which yield disproportionate benefits at high compute--from compute-independent innovations, which improve efficiency across compute scales. The impact is quantified using Compute-Equivalent Gain (CEG). Experimental validation with nanoGPT models confirms that compute-independent advancements yield significant performance gains (e.g., with combined CEG up to $3.5\times$) across the tested scales. In contrast, compute-dependent advancements were detrimental to performance at smaller experimental scales, but showed improved CEG (on par with the baseline) as model size increased, a trend consistent with their definition of yielding primary benefits at higher compute. Crucially, these findings indicate that restrictions on computational hardware, while potentially slowing LLM progress, are insufficient to prevent all capability gains driven by algorithmic advancements. We argue that effective AI oversight must therefore incorporate mechanisms for understanding, anticipating, and potentially guiding algorithmic research, moving beyond a singular focus on hardware. The proposed framework also serves as an analytical tool for forecasting AI progress.

Related papers

Agentic Reinforced Policy Optimization [66.96989268893932]
Large-scale reinforcement learning with verifiable rewards (RLVR) has demonstrated its effectiveness in harnessing the potential of large language models (LLMs) for single-turn reasoning tasks.<n>Current RL algorithms inadequately balance the models' intrinsic long-horizon reasoning capabilities and their proficiency in multi-turn tool interactions.<n>We propose Agentic Reinforced Policy Optimization (ARPO), a novel agentic RL algorithm tailored for training multi-turn LLM-based agents.
arXiv Detail & Related papers (2025-07-26T07:53:11Z)
LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization [48.91511514636768]
We present Length-Adaptive Policy Optimization (LAPO), a framework that transforms reasoning length control from an external constraint into an intrinsic model capability.<n>LAPO enables models to internalize an understanding of appropriate reasoning depth through a two-stage reinforcement learning process.<n> Experiments on mathematical reasoning benchmarks demonstrate that LAPO reduces token usage by up to 40.9% while improving accuracy by 2.3%.
arXiv Detail & Related papers (2025-07-21T16:14:41Z)
Reasoning on a Budget: A Survey of Adaptive and Controllable Test-Time Compute in LLMs [45.83245433138508]
Large language models (LLMs) have rapidly progressed into general-purpose agents capable of solving a broad spectrum of tasks.<n>They apply fixed inference-time compute regardless of task complexity, often overthinking simple problems while underthinking hard ones.<n>This survey presents a comprehensive review of efficient test-time compute strategies, which aim to improve the computational efficiency of LLM reasoning.
arXiv Detail & Related papers (2025-07-02T18:27:42Z)
DynScaling: Efficient Verifier-free Inference Scaling via Dynamic and Integrated Sampling [20.605487145370752]
Inference-time scaling has proven effective in boosting large language model (LLM) performance through increased test-time computation.<n>Yet, its practical application is often hindered by reliance on external verifiers or a lack of optimization for realistic computational constraints.<n>We propose DynScaling, which addresses these limitations through two primary innovations: an integrated parallel-sequential sampling strategy and a bandit-based dynamic budget allocation framework.
arXiv Detail & Related papers (2025-06-19T05:40:54Z)
R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference [77.47238561728459]
R-Sparse is a training-free activation sparsity approach capable of achieving high sparsity levels in advanced LLMs.<n> Experiments on Llama-2/3 and Mistral models across ten diverse tasks demonstrate that R-Sparse achieves comparable performance at 50% model-level sparsity.
arXiv Detail & Related papers (2025-04-28T03:30:32Z)
DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs [70.91804882618243]
This paper proposes DSMoE, a novel approach that achieves sparsification by partitioning pre-trained FFN layers into computational blocks.<n>We implement adaptive expert routing using sigmoid activation and straight-through estimators, enabling tokens to flexibly access different aspects of model knowledge.<n>Experiments on LLaMA models demonstrate that under equivalent computational constraints, DSMoE achieves superior performance compared to existing pruning and MoE approaches.
arXiv Detail & Related papers (2025-02-18T02:37:26Z)
Explore Activation Sparsity in Recurrent LLMs for Energy-Efficient Neuromorphic Computing [3.379854610429579]
Recurrent Large Language Models (R-LLM) have proven effective in mitigating the complexity of self-attention.<n>We propose a low-cost, training-free algorithm to sparsify R-LLMs' activations to enhance energy efficiency on neuromorphic hardware.
arXiv Detail & Related papers (2025-01-09T19:13:03Z)
Latenrgy: Model Agnostic Latency and Energy Consumption Prediction for Binary Classifiers [0.0]
Machine learning systems increasingly drive innovation across scientific fields and industry.<n>Yet challenges in compute overhead, specifically during inference, limit their scalability and sustainability.<n>This study addresses critical gaps in the literature, chiefly the lack of generalized predictive techniques for latency and energy consumption.
arXiv Detail & Related papers (2024-12-26T14:51:24Z)
Adaptive Pruning for Large Language Models with Structural Importance Awareness [66.2690963378878]
Large language models (LLMs) have significantly improved language understanding and generation capabilities.<n>LLMs are difficult to deploy on resource-constrained edge devices due to their high computational and storage resource demands.<n>We propose structurally-aware adaptive pruning (SAAP) to significantly reduce the computational and memory costs while maintaining model performance.
arXiv Detail & Related papers (2024-12-19T18:08:04Z)
eFedLLM: Efficient LLM Inference Based on Federated Learning [1.6179784294541053]
Large Language Models (LLMs) herald a transformative era in artificial intelligence (AI) This paper introduces an effective approach that enhances the operational efficiency and affordability of LLM inference.
arXiv Detail & Related papers (2024-11-24T22:50:02Z)
Reinforcement Learning under Latent Dynamics: Toward Statistical and Algorithmic Modularity [51.40558987254471]
Real-world applications of reinforcement learning often involve environments where agents operate on complex, high-dimensional observations. This paper addresses the question of reinforcement learning under $textitgeneral$ latent dynamics from a statistical and algorithmic perspective.
arXiv Detail & Related papers (2024-10-23T14:22:49Z)
Tender: Accelerating Large Language Models via Tensor Decomposition and Runtime Requantization [0.6445087473595953]
Large language models (LLMs) demonstrate outstanding performance in various tasks in machine learning. deploying LLM inference poses challenges due to the high compute and memory requirements. We present Tender, an algorithm-hardware co-design solution that enables efficient deployment of LLM inference at low precision.
arXiv Detail & Related papers (2024-06-16T09:51:55Z)
Quantum Data Encoding: A Comparative Analysis of Classical-to-Quantum Mapping Techniques and Their Impact on Machine Learning Accuracy [0.0]
This research explores the integration of quantum data embedding techniques into classical machine learning (ML) algorithms. Our findings reveal that quantum data embedding contributes to improved classification accuracy and F1 scores.
arXiv Detail & Related papers (2023-11-17T08:00:08Z)
AxOMaP: Designing FPGA-based Approximate Arithmetic Operators using Mathematical Programming [2.898055875927704]
We propose a data analysis-driven mathematical programming-based approach to synthesizing approximate operators for FPGAs. Specifically, we formulate mixed integer quadratically constrained programs based on the results of correlation analysis of the characterization data. Compared to traditional evolutionary algorithms-based optimization, we report up to 21% improvement in the hypervolume, for joint optimization of PPA and BEHAV.
arXiv Detail & Related papers (2023-09-23T18:23:54Z)
Efficient Model-Free Exploration in Low-Rank MDPs [76.87340323826945]
Low-Rank Markov Decision Processes offer a simple, yet expressive framework for RL with function approximation. Existing algorithms are either (1) computationally intractable, or (2) reliant upon restrictive statistical assumptions. We propose the first provably sample-efficient algorithm for exploration in Low-Rank MDPs.
arXiv Detail & Related papers (2023-07-08T15:41:48Z)
Towards Compute-Optimal Transfer Learning [82.88829463290041]
We argue that zero-shot structured pruning of pretrained models allows them to increase compute efficiency with minimal reduction in performance. Our results show that pruning convolutional filters of pretrained models can lead to more than 20% performance improvement in low computational regimes.
arXiv Detail & Related papers (2023-04-25T21:49:09Z)
Representation Learning with Multi-Step Inverse Kinematics: An Efficient and Optimal Approach to Rich-Observation RL [106.82295532402335]
Existing reinforcement learning algorithms suffer from computational intractability, strong statistical assumptions, and suboptimal sample complexity. We provide the first computationally efficient algorithm that attains rate-optimal sample complexity with respect to the desired accuracy level. Our algorithm, MusIK, combines systematic exploration with representation learning based on multi-step inverse kinematics.
arXiv Detail & Related papers (2023-04-12T14:51:47Z)
Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting. We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.