Related papers: AI Progress Should Be Measured by Capability-Per-Resource, Not Scale Alone: A Framework for Gradient-Guided Resource Allocation in LLMs

AI Progress Should Be Measured by Capability-Per-Resource, Not Scale Alone: A Framework for Gradient-Guided Resource Allocation in LLMs

URL: http://arxiv.org/abs/2511.01077v1
Date: Sun, 02 Nov 2025 20:59:51 GMT
Title: AI Progress Should Be Measured by Capability-Per-Resource, Not Scale Alone: A Framework for Gradient-Guided Resource Allocation in LLMs
Authors: David McCoy, Yulun Wu, Zachary Butzin-Dozier,
Abstract summary: We argue that AI development should be fundamentally reoriented toward capability-per-resource rather than capability alone.<n>We present a theoretical framework demonstrating that resource-allocation decisions guided by gradient influence patterns can dramatically improve efficiency throughout the AI lifecycle.
Score: 7.850805629833066
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This position paper challenges the "scaling fundamentalism" dominating AI research, where unbounded growth in model size and computation has led to unsustainable environmental impacts and widening resource inequality. We argue that LLM development should be fundamentally reoriented toward capability-per-resource rather than capability alone. We present a theoretical framework demonstrating that resource-allocation decisions guided by gradient influence patterns can dramatically improve efficiency throughout the AI lifecycle. Our analysis shows that in transformer-based models, where a small fraction of parameters exert outsized influence (following heavy-tailed distributions), three critical insights emerge: (1) updating only high-influence parameters strictly outperforms full-parameter tuning on a performance-per-resource basis; (2) simple gradient norms provide computationally efficient proxies for identifying these high-influence components; and (3) coordinated parameter and data selection yields multiplicative efficiency gains, potentially reducing resource requirements by orders of magnitude. Building on these theoretical foundations, we propose a two stage paradigm marginal-return pretraining for foundation developers and influence guided adaptation for downstream users bridged by gradient blueprints, metadata describing which parameters matter most for various tasks. This capability-per-resource perspective transforms what were once considered pragmatic hardware workarounds into theoretically optimal strategies, democratizing access to cutting-edge AI capabilities while significantly reducing environmental impact. By embedding resource consciousness into how we develop, adapt, and evaluate models, we can reshape AI progress toward a more sustainable and equitable future.

Related papers

EmboCoach-Bench: Benchmarking AI Agents on Developing Embodied Robots [68.29056647487519]
Embodied AI is fueled by high-fidelity simulation and large-scale data collection.<n>However, this scaling capability remains bottlenecked by a reliance on labor-intensive manual oversight.<n>We introduce textscEmboCoach-Bench, a benchmark evaluating the capacity of LLM agents to autonomously engineer embodied policies.
arXiv Detail & Related papers (2026-01-29T11:33:49Z)
ORPR: An OR-Guided Pretrain-then-Reinforce Learning Model for Inventory Management [9.138155308817215]
"Pretrain-then-Reinforce" approach reconciles AI's adaptive perception with Operations Research's structural rigor.<n>We show that a lightweight, domain-informed model can deliver state-of-the-art performance and robust transferability when guided by structured OR logic.
arXiv Detail & Related papers (2025-12-22T03:39:43Z)
HunyuanOCR Technical Report [28.160663178408864]
HunyuanOCR is a commercial-grade, open-source, and lightweight (1B parameters) Vision-Language Model (VLM) dedicated to OCR tasks.<n>It surpasses current public solutions in perception tasks (Text Spotting, Parsing) and excels in semantic tasks (IE, Text Image Translation)<n>It achieves state-of-the-art (SOTA) results on OCRBench among VLMs with fewer than 3B parameters.
arXiv Detail & Related papers (2025-11-24T17:59:59Z)
Bridging VLMs and Embodied Intelligence with Deliberate Practice Policy Optimization [72.20212909644017]
Deliberate Practice Policy Optimization (DPPO) is a metacognitive Metaloop'' training framework.<n>DPPO alternates between supervised fine-tuning (competence expansion) and reinforcement learning (skill refinement)<n> Empirically, training a vision-language embodied model with DPPO, referred to as Pelican-VL 1.0, yields a 20.3% performance improvement over the base model.<n>We are open-sourcing both the models and code, providing the first systematic framework that alleviates the data and resource bottleneck.
arXiv Detail & Related papers (2025-11-20T17:58:04Z)
Shifting AI Efficiency From Model-Centric to Data-Centric Compression [67.45087283924732]
We argue that the focus of research for AI is shifting from model-centric compression to data-centric compression.<n>Data-centric compression improves AI efficiency by directly compressing the volume of data processed during model training or inference.<n>Our work aims to provide a novel perspective on AI efficiency, synthesize existing efforts, and catalyze innovation to address the challenges posed by ever-increasing context lengths.
arXiv Detail & Related papers (2025-05-25T13:51:17Z)
Training Large Language Models to Reason via EM Policy Gradient [0.27195102129094995]
We introduce an off-policy reinforcement learning algorithm, EM Policy Gradient, to enhance LLM reasoning.<n>We evaluate the effectiveness of EM Policy Gradient on the GSM8K and MATH (HARD) datasets.<n>Models fine-tuned with our method exhibit cognitive behaviors, such as sub-problem decomposition, self-verification, and backtracking.
arXiv Detail & Related papers (2025-04-24T01:31:05Z)
RDTF: Resource-efficient Dual-mask Training Framework for Multi-frame Animated Sticker Generation [29.340362062804967]
Under constrained resources, training a smaller video generation model from scratch can outperform parameter-efficient tuning on larger models in downstream applications.<n>We propose a difficulty-adaptive curriculum learning method, which decomposes the sample entropy into static and adaptive components.
arXiv Detail & Related papers (2025-03-22T11:28:25Z)
Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models [79.2162092822111]
We systematically evaluate reinforcement learning (RL) and control-based methods on a suite of navigation tasks.<n>We employ a latent dynamics model using the Joint Embedding Predictive Architecture (JEPA) and employ it for planning.<n>Our results show that model-free RL benefits most from large amounts of high-quality data, whereas model-based planning generalizes better to unseen layouts.
arXiv Detail & Related papers (2025-02-20T18:39:41Z)
Signformer is all you need: Towards Edge AI for Sign Language [0.0]
We present nature analysis of sign languages to inform our algorithmic design and deliver a scalable transformer pipeline with convolution and attention novelty. We achieve new 2nd place on leaderboard with a parametric reduction of 467-1807x against the finests as of 2024 and outcompete almost every other methods in a lighter configuration of 0.57 million parameters.
arXiv Detail & Related papers (2024-11-19T22:27:53Z)
Frugal inference for control [2.20480252274709]
Key challenge in advancing artificial intelligence is achieving the right balance between external movement and resource use.<n>We develop a version of the POMDP framework where the information gained through inference is treated as a resource that must be optimized alongside task performance and motion effort.<n>This work provides a foundation for a new type of rational computation that both brains and machines could use for effective but resource-efficient control under uncertainty.
arXiv Detail & Related papers (2024-06-20T15:50:38Z)
Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks. We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level. We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z)
Boosting Inference Efficiency: Unleashing the Power of Parameter-Shared Pre-trained Language Models [109.06052781040916]
We introduce a technique to enhance the inference efficiency of parameter-shared language models. We also propose a simple pre-training technique that leads to fully or partially shared models. Results demonstrate the effectiveness of our methods on both autoregressive and autoencoding PLMs.
arXiv Detail & Related papers (2023-10-19T15:13:58Z)
SALMON: Self-Alignment with Instructable Reward Models [80.83323636730341]
This paper presents a novel approach, namely SALMON, to align base language models with minimal human supervision. We develop an AI assistant named Dromedary-2 with only 6 exemplars for in-context learning and 31 human-defined principles.
arXiv Detail & Related papers (2023-10-09T17:56:53Z)
Hindsight Learning for MDPs with Exogenous Inputs [20.556789174972334]
We design a class of data-efficient algorithms for resource management problems called Hindsight Learning (HL) HL algorithms achieve data efficiency by leveraging a key insight: having samples of the variables, past decisions can be revisited in hindsight to infer counterfactual consequences that can accelerate policy improvements. We scale our algorithms to a business-critical cloud resource management problem -- allocating Virtual Machines (VMs) to physical machines, and simulate their performance with real datasets from a large public cloud provider.
arXiv Detail & Related papers (2022-07-13T15:18:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.