Related papers: HEFT: A Coarse-to-Fine Hierarchy for Enhancing the Efficiency and Accuracy of Language Model Reasoning

HEFT: A Coarse-to-Fine Hierarchy for Enhancing the Efficiency and Accuracy of Language Model Reasoning

URL: http://arxiv.org/abs/2509.09801v1
Date: Thu, 11 Sep 2025 19:06:46 GMT
Title: HEFT: A Coarse-to-Fine Hierarchy for Enhancing the Efficiency and Accuracy of Language Model Reasoning
Authors: Brennen Hill,
Abstract summary: HEFT is a novel hierarchical adaptation strategy that composes two distinct PEFT methods in a coarse-to-fine manner.<n>A model fine-tuned for only three epochs with our HEFT strategy achieves an accuracy of 85.17%, exceeding the performance of models trained for 20 epochs.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The adaptation of large language models (LLMs) to specialized reasoning tasks is fundamentally constrained by computational resources. Parameter-Efficient Fine-Tuning (PEFT) methods have emerged as a powerful solution, yet the landscape of these techniques is diverse, with distinct methods operating in either the model's weight space or its representation space. This paper investigates the hypothesis that a synergistic combination of these paradigms can unlock superior performance and efficiency. We introduce HEFT (Hierarchical Efficient Fine-Tuning), a novel hierarchical adaptation strategy that composes two distinct PEFT methods in a coarse-to-fine manner: first, a broad, foundational adaptation in the weight space using Low-Rank Adaptation (LoRA), followed by a precise, surgical refinement of internal activations using Representation Fine-Tuning (ReFT). We evaluate this approach by fine-tuning a Llama-2-7B model on the BoolQ benchmark, a challenging dataset for inferential reasoning. Our results reveal a profound synergistic effect. A model fine-tuned for only three epochs with our HEFT strategy achieves an accuracy of 85.17\%, exceeding the performance of models trained for 20 epochs with either LoRA-only (85.05\%) or ReFT-only (83.36\%) methodologies. This work demonstrates that the thoughtful composition of PEFT methods is a potent algorithmic innovation, offering a more efficient and effective path toward advancing the reasoning capabilities of language models. By achieving superior results with a fraction of the computational budget, our findings present a principled approach to overcoming the obstacles inherent in adapting large-scale models for complex cognitive tasks.

Related papers

Performance and Complexity Trade-off Optimization of Speech Models During Training [5.335528687192602]
In speech machine learning, neural network models are typically designed by choosing an architecture with fixed layer sizes and structure.<n>While the overall architecture is usually guided by prior knowledge of the task, the sizes of individual layers are often chosen.<n>Unlike pruning methods, our approach allows the model size to be dynamically optimized for a target performance-complexity trade-off.
arXiv Detail & Related papers (2026-01-20T08:00:05Z)
TuckA: Hierarchical Compact Tensor Experts for Efficient Fine-Tuning [83.93651411533533]
We introduce Tucker Adaptation (TuckA), a method with four key properties.<n>We develop an efficient batch-level routing mechanism, which reduces the router's parameter size by a factor of $L$.<n>Experiments on benchmarks in natural language understanding, image classification, and mathematical reasoning speak to the efficacy of TuckA.
arXiv Detail & Related papers (2025-11-10T09:03:16Z)
TCPO: Thought-Centric Preference Optimization for Effective Embodied Decision-making [75.29820290660065]
This paper proposes Thought-Centric Preference Optimization ( TCPO) for effective embodied decision-making.<n>It emphasizes the alignment of the model's intermediate reasoning process, mitigating the problem of model degradation.<n>Experiments in the ALFWorld environment demonstrate an average success rate of 26.67%, achieving a 6% improvement over RL4VLM.
arXiv Detail & Related papers (2025-09-10T11:16:21Z)
LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization [48.91511514636768]
Length-Adaptive Policy Optimization transforms reasoning length control from an external constraint into an intrinsic model capability.<n>LAPO enables models to internalize an understanding of appropriate reasoning depth through a two-stage reinforcement learning process.<n> Experiments on mathematical reasoning benchmarks demonstrate that LAPO reduces token usage by up to 40.9% while improving accuracy by 2.3%.
arXiv Detail & Related papers (2025-07-21T16:14:41Z)
Dual Decomposition of Weights and Singular Value Low Rank Adaptation [9.048461365342204]
We propose DuDe, a novel approach that decomposes weight matrices into magnitude and direction components.<n>Our evaluation demonstrates DuDe's superior performance and robustness, achieving up to 48.35% accuracy on MMLU and 62.53% ($pm$ 1.59) accuracy on GSM8K.
arXiv Detail & Related papers (2025-05-20T13:49:15Z)
Towards Fair Class-wise Robustness: Class Optimal Distribution Adversarial Training [1.5565181723989001]
Adversarial training has proven to be a highly effective method for improving the robustness of deep neural networks against adversarial attacks.<n>It has been observed to exhibit a limitation in terms of robust fairness, characterized by a significant disparity in robustness across different classes.<n>Recent efforts to mitigate this problem have turned to class-wise-weighted methods.<n>This paper proposes a novel min-max training framework, Class Optimal Distribution Adversarial Training.
arXiv Detail & Related papers (2025-01-08T14:19:03Z)
Feature Alignment-Based Knowledge Distillation for Efficient Compression of Large Language Models [4.737806982257592]
This study proposes a knowledge distillation algorithm based on large language models and feature alignment.<n>The proposed model performs very close to the state-of-the-art GPT-4 model in terms of evaluation indicators such as perplexity, BLEU, ROUGE, and CER.
arXiv Detail & Related papers (2024-12-27T04:37:06Z)
Enhancing Training Data Attribution for Large Language Models with Fitting Error Consideration [74.09687562334682]
We introduce a novel training data attribution method called Debias and Denoise Attribution (DDA) Our method significantly outperforms existing approaches, achieving an averaged AUC of 91.64%. DDA exhibits strong generality and scalability across various sources and different-scale models like LLaMA2, QWEN2, and Mistral.
arXiv Detail & Related papers (2024-10-02T07:14:26Z)
See Further for Parameter Efficient Fine-tuning by Standing on the Shoulders of Decomposition [56.87609859444084]
parameter-efficient fine-tuning (PEFT) focuses on optimizing a select subset of parameters while keeping the rest fixed, significantly lowering computational and storage overheads.<n>We take the first step to unify all approaches by dissecting them from a decomposition perspective.<n>We introduce two novel PEFT methods alongside a simple yet effective framework designed to enhance the performance of PEFT techniques across various applications.
arXiv Detail & Related papers (2024-07-07T15:44:42Z)
Efficiency optimization of large-scale language models based on deep learning in natural language processing tasks [6.596361762662328]
Internal structure and operation mechanism of large-scale language models are analyzed theoretically. We evaluate the contribution of adaptive optimization algorithms (such as AdamW), massively parallel computing techniques, and mixed precision training strategies.
arXiv Detail & Related papers (2024-05-20T00:10:00Z)
Boosting Inference Efficiency: Unleashing the Power of Parameter-Shared Pre-trained Language Models [109.06052781040916]
We introduce a technique to enhance the inference efficiency of parameter-shared language models. We also propose a simple pre-training technique that leads to fully or partially shared models. Results demonstrate the effectiveness of our methods on both autoregressive and autoencoding PLMs.
arXiv Detail & Related papers (2023-10-19T15:13:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.