Structured Pruning for Diverse Best-of-N Reasoning Optimization
- URL: http://arxiv.org/abs/2506.03978v2
- Date: Mon, 09 Jun 2025 04:15:05 GMT
- Title: Structured Pruning for Diverse Best-of-N Reasoning Optimization
- Authors: Hieu Trung Nguyen, Bao Nguyen, Viet Anh Nguyen,
- Abstract summary: We propose SPRINT, a contrastive learning framework that dynamically selects the optimal head and layer to prune during inference.<n>By aligning question embeddings with head embeddings, SPRINT identifies those pruned-head configurations that result in more accurate reasoning.<n>Our method significantly outperforms traditional best-of-$N$ and random head selection strategies on the MATH500 and GSM8K datasets.
- Score: 13.25202837526456
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Model pruning in transformer-based language models, traditionally viewed as a means of achieving computational savings, can enhance the model's reasoning capabilities. In this work, we uncover a surprising phenomenon: the selective pruning of certain attention heads leads to improvements in reasoning performance, particularly on challenging tasks. Motivated by this observation, we propose SPRINT, a novel contrastive learning framework that dynamically selects the optimal head and layer to prune during inference. By aligning question embeddings with head embeddings, SPRINT identifies those pruned-head configurations that result in more accurate reasoning. Extensive experiments demonstrate that our method significantly outperforms traditional best-of-$N$ and random head selection strategies on the MATH500 and GSM8K datasets.
Related papers
- Detecting and Pruning Prominent but Detrimental Neurons in Large Language Models [68.57424628540907]
Large language models (LLMs) often develop learned mechanisms specialized to specific datasets.<n>We introduce a fine-tuning approach designed to enhance generalization by identifying and pruning neurons associated with dataset-specific mechanisms.<n>Our method employs Integrated Gradients to quantify each neuron's influence on high-confidence predictions, pinpointing those that disproportionately contribute to dataset-specific performance.
arXiv Detail & Related papers (2025-07-12T08:10:10Z) - Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute [57.16286134405821]
We propose Fractional Reasoning, a framework that enables continuous control over reasoning intensity at inference time.<n>Our method operates by extracting the latent steering vector associated with deeper reasoning and reapplying it with a tunable scaling factor.<n> Experiments on GSM8K, MATH500, and GPQA demonstrate that Fractional Reasoning consistently improves performance across diverse reasoning tasks and models.
arXiv Detail & Related papers (2025-06-18T21:15:59Z) - LARES: Latent Reasoning for Sequential Recommendation [96.26996622771593]
We present LARES, a novel and scalable LAtent REasoning framework for Sequential recommendation.<n>Our proposed approach employs a recurrent architecture that allows flexible expansion of reasoning depth without increasing parameter complexity.<n>We show that LARES exhibits seamless compatibility with existing advanced models, further improving their recommendation performance.
arXiv Detail & Related papers (2025-05-22T16:22:54Z) - "FRAME: Forward Recursive Adaptive Model Extraction-A Technique for Advance Feature Selection" [0.0]
This study introduces a novel hybrid approach, the Forward Recursive Adaptive Model Extraction Technique (FRAME)<n>FRAME combines Forward Selection and Recursive Feature Elimination to enhance feature selection across diverse datasets.<n>The results demonstrate that FRAME consistently delivers superior predictive performance based on downstream machine learning evaluation metrics.
arXiv Detail & Related papers (2025-01-21T08:34:10Z) - Optimized Feature Generation for Tabular Data via LLMs with Decision Tree Reasoning [53.241569810013836]
We propose a novel framework that utilizes large language models (LLMs) to identify effective feature generation rules.
We use decision trees to convey this reasoning information, as they can be easily represented in natural language.
OCTree consistently enhances the performance of various prediction models across diverse benchmarks.
arXiv Detail & Related papers (2024-06-12T08:31:34Z) - Rethinking Low-Rank Adaptation in Vision: Exploring Head-Level Responsiveness across Diverse Tasks [6.068296063531189]
Low-rank adaptation (LoRA) has shifted the paradigm of adapting pre-trained Vision Transformers (ViT)
We propose Head-level responsiveness tuning for low-rank adaptation (Heart-LoRA)
arXiv Detail & Related papers (2024-04-13T04:01:35Z) - Picking the Underused Heads: A Network Pruning Perspective of Attention
Head Selection for Fusing Dialogue Coreference Information [50.41829484199252]
Transformer-based models with the multi-head self-attention mechanism are widely used in natural language processing.
We investigate the attention head selection and manipulation strategy for feature injection from a network pruning perspective.
arXiv Detail & Related papers (2023-12-15T05:27:24Z) - Data-driven Preference Learning Methods for Sorting Problems with
Multiple Temporal Criteria [17.673512636899076]
This study presents novel preference learning approaches to multiple criteria sorting problems in the presence of temporal criteria.
To enhance scalability and accommodate learnable time discount factors, we introduce a novel monotonic Recurrent Neural Network (mRNN)
The proposed mRNN can describe the preference dynamics by depicting marginal value functions and personalized time discount factors along with time.
arXiv Detail & Related papers (2023-09-22T05:08:52Z) - A Generic Approach for Enhancing GANs by Regularized Latent Optimization [79.00740660219256]
We introduce a generic framework called em generative-model inference that is capable of enhancing pre-trained GANs effectively and seamlessly.
Our basic idea is to efficiently infer the optimal latent distribution for the given requirements using Wasserstein gradient flow techniques.
arXiv Detail & Related papers (2021-12-07T05:22:50Z) - Learning MDPs from Features: Predict-Then-Optimize for Sequential
Decision Problems by Reinforcement Learning [52.74071439183113]
We study the predict-then-optimize framework in the context of sequential decision problems (formulated as MDPs) solved via reinforcement learning.
Two significant computational challenges arise in applying decision-focused learning to MDPs.
arXiv Detail & Related papers (2021-06-06T23:53:31Z) - Trajectory-wise Multiple Choice Learning for Dynamics Generalization in
Reinforcement Learning [137.39196753245105]
We present a new model-based reinforcement learning algorithm that learns a multi-headed dynamics model for dynamics generalization.
We incorporate context learning, which encodes dynamics-specific information from past experiences into the context latent vector.
Our method exhibits superior zero-shot generalization performance across a variety of control tasks, compared to state-of-the-art RL methods.
arXiv Detail & Related papers (2020-10-26T03:20:42Z) - Experimental Design for Overparameterized Learning with Application to
Single Shot Deep Active Learning [5.141687309207561]
Modern machine learning models are trained on large amounts of labeled data.
Access to large volumes of labeled data is often limited or expensive.
We propose a new design strategy for curating the training set.
arXiv Detail & Related papers (2020-09-27T11:27:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.