Systematic Optimization of Open Source Large Language Models for Mathematical Reasoning
- URL: http://arxiv.org/abs/2509.07238v1
- Date: Mon, 08 Sep 2025 21:31:43 GMT
- Title: Systematic Optimization of Open Source Large Language Models for Mathematical Reasoning
- Authors: Pranav Pawar, Dhwaj Jain, Varun Gupta, Kaustav Dedhia, Dashrath Kale, Sudhir Dhekane,
- Abstract summary: This paper presents a practical investigation into fine-tuning model parameters for mathematical reasoning tasks.<n>A holistically optimized framework is introduced for five state-of-the-art models on mathematical reasoning tasks.
- Score: 1.8254074486719114
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: This paper presents a practical investigation into fine-tuning model parameters for mathematical reasoning tasks through experimenting with various configurations including randomness control, reasoning depth, and sampling strategies, careful tuning demonstrates substantial improvements in efficiency as well as performance. A holistically optimized framework is introduced for five state-of-the-art models on mathematical reasoning tasks, exhibiting significant performance boosts while maintaining solution correctness. Through systematic parameter optimization across Qwen2.5-72B, Llama-3.1-70B, DeepSeek-V3, Mixtral-8x22B, and Yi-Lightning, consistent efficiency gains are demonstrated with 100% optimization success rate. The methodology achieves an average 29.4% reduction in computational cost and 23.9% improvement in inference speed across all tested models. This framework systematically searches parameter spaces including temperature (0.1-0.5), reasoning steps (4-12), planning periods (1-4), and nucleus sampling (0.85-0.98), determining optimal configurations through testing on mathematical reasoning benchmarks. Critical findings show that lower temperature regimes (0.1-0.4) and reduced reasoning steps (4-6) consistently enhance efficiency without compromising accuracy. DeepSeek-V3 achieves the highest accuracy at 98%, while Mixtral-8x22B delivers the most cost-effective performance at 361.5 tokens per accurate response. Key contributions include: (1) the first comprehensive optimization study for five diverse SOTA models in mathematical reasoning, (2) a standardized production-oriented parameter optimization framework, (3) discovery of universal optimization trends applicable across model architectures, and (4) production-ready configurations with extensive performance characterization.
Related papers
- Distributional Reinforcement Learning with Information Bottleneck for Uncertainty-Aware DRAM Equalization [8.695939803795499]
We propose a distributional risk-sensitive reinforcement learning framework integrating Information Bottleneck latent representations with Conditional Value-at-Risk optimization.<n>We introduce rate-distortion optimal signal compression achieving 51 times speedup over eye diagrams.<n>We show that the proposed framework provides a practical solution for production-scale equalizer optimization with certified worst-case guarantees.
arXiv Detail & Related papers (2026-03-05T03:34:25Z) - Improving the efficiency of QAOA using efficient parameter transfer initialization and targeted-single-layer regularized optimization with minimal performance degradation [1.7761223012399538]
We investigate the MaxCut problem in three different families of graphs using Quantum approximate optimization algorithm (QAOA) ansats.<n>For 3 regular (3R), Erdos Renyi (ER), and Barabasi Albert (BA) graphs, the parameter transfer approach achieved mean approximation ratios of 0.9443 for targeted-single layer optimization.<n>It represents 98.88 percent optimal performance, with 8.06 times computational speedup in unweighted graphs.
arXiv Detail & Related papers (2026-01-22T08:51:03Z) - Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning [73.10669391954801]
We present the Ring-linear model series, specifically including Ring-mini-linear-2.0 and Ring-flash-linear-2.0.<n>Both models adopt a hybrid architecture that effectively integrates linear attention and softmax attention.<n>Compared to a 32 billion parameter dense model, this series reduces inference cost to 1/10, and compared to the original Ring series, the cost is also reduced by over 50%.
arXiv Detail & Related papers (2025-10-22T07:59:38Z) - Divergence Minimization Preference Optimization for Diffusion Model Alignment [58.651951388346525]
Divergence Minimization Preference Optimization (DMPO) is a principled method for aligning diffusion models by minimizing reverse KL divergence.<n>Our results show that diffusion models fine-tuned with DMPO can consistently outperform or match existing techniques.<n>DMPO unlocks a robust and elegant pathway for preference alignment, bridging principled theory with practical performance in diffusion models.
arXiv Detail & Related papers (2025-07-10T07:57:30Z) - Don't Think Longer, Think Wisely: Optimizing Thinking Dynamics for Large Reasoning Models [68.96619605651155]
Large reasoning models (LRMs) may drastically increase the output length due to overthinking.<n>We propose a dynamic optimization framework that segments model-generated reasoning paths into distinct thinking patterns.<n>Our method achieves up to a 12% accuracy improvement and reducing token usage from approximately 5,000 to 3,000 tokens.
arXiv Detail & Related papers (2025-05-27T20:59:29Z) - EfficientLLM: Efficiency in Large Language Models [64.3537131208038]
Large Language Models (LLMs) have driven significant progress, yet their growing counts and context windows incur prohibitive compute, energy, and monetary costs.<n>We introduce EfficientLLM, a novel benchmark and the first comprehensive empirical study evaluating efficiency techniques for LLMs at scale.
arXiv Detail & Related papers (2025-05-20T02:27:08Z) - Inference Scaling vs Reasoning: An Empirical Analysis of Compute-Optimal LLM Problem-Solving [0.0]
Recent advances in large language models (LLMs) have predominantly focused on maximizing accuracy and reasoning capabilities.<n>This paper investigates the potential synergy between reasoning enhancement and computational efficiency by analyzing the integration of two contrasting approaches.
arXiv Detail & Related papers (2024-12-20T08:42:45Z) - Evaluating the effectiveness, reliability and efficiency of a multi-objective sequential optimization approach for building performance design [0.8168080812068832]
This paper proposes and evaluates a sequential approach for multi-objective design optimization of building geometry, fabric, HVAC system and controls for building performance.<n>The performance of the sequential approach is benchmarked against a full factorial search and compared to the NSGA-II algorithm.<n>This research indicates that a sequential optimization approach is a highly efficient and robust alternative to the standard NSGA-II algorithm.
arXiv Detail & Related papers (2024-12-13T08:00:00Z) - Crafting Efficient Fine-Tuning Strategies for Large Language Models [2.633490094119608]
Fine-tuning large language models (LLMs) with as few as 200 samples can improve model accuracy from 70% to 88% in a product attribute extraction task.
A bayesian hyperparameter optimization method, which evaluates models at 20% of total training time, correlates strongly with final model performance.
This approach led to a 2% improvement in accuracy over baseline models when evaluated on an independent test set.
arXiv Detail & Related papers (2024-07-18T21:36:00Z) - Uncertainty-Aware Testing-Time Optimization for 3D Human Pose Estimation [65.91490997921859]
We propose an Uncertainty-Aware testing-time Optimization (UAO) framework for 3D human pose estimation.<n>The framework keeps the prior information of the pre-trained model and alleviates the overfitting problem using the uncertainty of joints.<n>Our approach outperforms the previous best result by a large margin of 5.5% on Human3.6M.
arXiv Detail & Related papers (2024-02-04T04:28:02Z) - Towards General and Efficient Online Tuning for Spark [55.30868031221838]
We present a general and efficient Spark tuning framework that can deal with the three issues simultaneously.
We have implemented this framework as an independent cloud service, and applied it to the data platform in Tencent.
arXiv Detail & Related papers (2023-09-05T02:16:45Z) - Estimate-Then-Optimize versus Integrated-Estimation-Optimization versus Sample Average Approximation: A Stochastic Dominance Perspective [21.945745750737952]
We show that a reverse behavior appears when the model class is well-specified and there is sufficient data.<n>We also demonstrate how standard sample average approximation (SAA) performs the worst when the model class is well-specified in terms of regret.
arXiv Detail & Related papers (2023-04-13T21:54:53Z) - Bayesian Optimization for Selecting Efficient Machine Learning Models [53.202224677485525]
We present a unified Bayesian Optimization framework for jointly optimizing models for both prediction effectiveness and training efficiency.
Experiments on model selection for recommendation tasks indicate models selected this way significantly improves model training efficiency.
arXiv Detail & Related papers (2020-08-02T02:56:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.