Efficient Strategy for Improving Large Language Model (LLM) Capabilities
- URL: http://arxiv.org/abs/2508.04073v1
- Date: Wed, 06 Aug 2025 04:08:26 GMT
- Title: Efficient Strategy for Improving Large Language Model (LLM) Capabilities
- Authors: Julián Camilo Velandia Gutiérrez,
- Abstract summary: Large Language Models (LLMs) have become a milestone in the field of artificial intelligence and natural language processing.<n>Their large-scale deployment remains constrained by the need for significant computational resources.<n>This work proposes starting from a base model to explore and combine data processing and careful data selection techniques.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) have become a milestone in the field of artificial intelligence and natural language processing. However, their large-scale deployment remains constrained by the need for significant computational resources. This work proposes starting from a base model to explore and combine data processing and careful data selection techniques, training strategies, and architectural adjustments to improve the efficiency of LLMs in resource-constrained environments and within a delimited knowledge base. The methodological approach included defining criteria for building reliable datasets, conducting controlled experiments with different configurations, and systematically evaluating the resulting variants in terms of capability, versatility, response time, and safety. Finally, comparative tests were conducted to measure the performance of the developed variants and to validate the effectiveness of the proposed strategies. This work is based on the master's thesis in Systems and Computer Engineering titled "Efficient Strategy for Improving the Capabilities of Large Language Models (LLMs)".
Related papers
- Systematic Evaluation of Optimization Techniques for Long-Context Language Models [15.377591633726396]
Large language models (LLMs) excel across diverse natural language processing tasks but face resource demands and limited context windows.<n>This paper systematically benchmarks these optimizations, characterizing memory usage, latency, and throughput, and studies how these methods impact the quality of text generation.
arXiv Detail & Related papers (2025-08-01T04:17:24Z) - Transferable Modeling Strategies for Low-Resource LLM Tasks: A Prompt and Alignment-Based Approach [1.3286097954612326]
This paper addresses the limited transfer and adaptation capabilities of large language models in low-resource language scenarios.<n>It proposes a unified framework that combines a knowledge transfer module with parameter-efficient fine-tuning strategies.<n>It enhances task-specific adaptability while preserving the general capabilities of large language models.
arXiv Detail & Related papers (2025-07-01T09:34:49Z) - Improving Multilingual Math Reasoning for African Languages [49.27985213689457]
We conduct experiments to evaluate different combinations of data types (translated versus synthetically generated), training stages (pre-training versus post-training), and other model adaptation configurations.<n>Our experiments focuses on mathematical reasoning tasks, using the Llama 3.1 model family as our base model.
arXiv Detail & Related papers (2025-05-26T11:35:01Z) - Generalizing Large Language Model Usability Across Resource-Constrained [0.43512163406552007]
dissertation presents a systematic study toward generalizing Large Language Models under real-world constraints.<n>First, it introduces a robust text-centric alignment framework that enables LLMs to seamlessly integrate diverse modalities.<n>Beyond multimodal setting, the dissertation investigates inference-time optimization strategies for LLMs.
arXiv Detail & Related papers (2025-05-13T01:00:12Z) - Large Language Model as Meta-Surrogate for Data-Driven Many-Task Optimization: A Proof-of-Principle Study [11.452011929848844]
This study proposes a novel meta-surrogate framework to assist many-task optimization.<n>We formulate a unified framework for many-task fitness prediction, by defining a universal model with metadata to fit a group of problems.<n>Our framework supports dual-level knowledge transfer -- at both the surrogate and individual levels -- enhancing optimization efficiency and robustness.
arXiv Detail & Related papers (2025-03-11T11:13:11Z) - LLM Post-Training: A Deep Dive into Reasoning Large Language Models [131.10969986056]
Large Language Models (LLMs) have transformed the natural language processing landscape and brought to life diverse applications.<n>Post-training methods enable LLMs to refine their knowledge, improve reasoning, enhance factual accuracy, and align more effectively with user intents and ethical considerations.
arXiv Detail & Related papers (2025-02-28T18:59:54Z) - A Survey on Inference Optimization Techniques for Mixture of Experts Models [50.40325411764262]
Large-scale Mixture of Experts (MoE) models offer enhanced model capacity and computational efficiency through conditional computation.<n> deploying and running inference on these models presents significant challenges in computational resources, latency, and energy efficiency.<n>This survey analyzes optimization techniques for MoE models across the entire system stack.
arXiv Detail & Related papers (2024-12-18T14:11:15Z) - Skill Learning Using Process Mining for Large Language Model Plan Generation [0.0]
Large language models (LLMs) hold promise for generating plans for complex tasks.
Their effectiveness is limited by sequential execution, lack of control flow models, and difficulties in skill retrieval.
We introduce a novel approach to skill learning in LLMs by integrating process mining techniques.
arXiv Detail & Related papers (2024-10-14T12:48:42Z) - EVOLvE: Evaluating and Optimizing LLMs For In-Context Exploration [76.66831821738927]
Large language models (LLMs) remain under-studied in scenarios requiring optimal decision-making under uncertainty.<n>We measure LLMs' (in)ability to make optimal decisions in bandits, a state-less reinforcement learning setting relevant to many applications.<n>Motivated by the existence of optimal exploration algorithms, we propose efficient ways to integrate this algorithmic knowledge into LLMs.
arXiv Detail & Related papers (2024-10-08T17:54:03Z) - Improving Meta-learning for Low-resource Text Classification and
Generation via Memory Imitation [87.98063273826702]
We propose a memory imitation meta-learning (MemIML) method that enhances the model's reliance on support sets for task adaptation.
A theoretical analysis is provided to prove the effectiveness of our method.
arXiv Detail & Related papers (2022-03-22T12:41:55Z) - SDA: Improving Text Generation with Self Data Augmentation [88.24594090105899]
We propose to improve the standard maximum likelihood estimation (MLE) paradigm by incorporating a self-imitation-learning phase for automatic data augmentation.
Unlike most existing sentence-level augmentation strategies, our method is more general and could be easily adapted to any MLE-based training procedure.
arXiv Detail & Related papers (2021-01-02T01:15:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.