Related papers: Systematic Evaluation of Optimization Techniques for Long-Context Language Models

Systematic Evaluation of Optimization Techniques for Long-Context Language Models

URL: http://arxiv.org/abs/2508.00305v1
Date: Fri, 01 Aug 2025 04:17:24 GMT
Title: Systematic Evaluation of Optimization Techniques for Long-Context Language Models
Authors: Ammar Ahmed, Sheng Di, Franck Cappello, Zirui Liu, Jingoo Han, Ali Anwar,
Abstract summary: Large language models (LLMs) excel across diverse natural language processing tasks but face resource demands and limited context windows.<n>This paper systematically benchmarks these optimizations, characterizing memory usage, latency, and throughput, and studies how these methods impact the quality of text generation.
Score: 15.377591633726396
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) excel across diverse natural language processing tasks but face resource demands and limited context windows. Although techniques like pruning, quantization, and token dropping can mitigate these issues, their efficacy in long-context scenarios and system evaluation remains underexplored. This paper systematically benchmarks these optimizations, characterizing memory usage, latency, and throughput, and studies how these methods impact the quality of text generation. We first analyze individual optimization methods for two LLM architectures supporting long context and then systematically evaluate combinations of these techniques to assess how this deeper analysis impacts performance metrics. We subsequently study the scalability of individual optimization methods on a larger variant with 70 billion-parameter model. Our novel insights reveal that naive combination inference optimization algorithms can adversely affect larger models due to compounded approximation errors, as compared to their smaller counterparts. Experiments show that relying solely on F1 obscures these effects by hiding precision-recall trade-offs in question answering tasks. By integrating system-level profiling with task-specific insights, this study helps LLM practitioners and researchers explore and balance efficiency, accuracy, and scalability across tasks and hardware configurations.

Related papers

What Works for 'Lost-in-the-Middle' in LLMs? A Study on GM-Extract and Mitigations [1.2879523047871226]
GM-Extract is a novel benchmark dataset meticulously designed to evaluate LLM performance on retrieval of control variables.<n>We conduct a systematic evaluation of 7-8B parameter models on two multi-document tasks (key-value extraction and question-answering)<n>While a distinct U-shaped curve was not consistently observed, our analysis reveals a clear pattern of performance across models.
arXiv Detail & Related papers (2025-11-17T20:50:50Z)
Teaching LLMs to Think Mathematically: A Critical Study of Decision-Making via Optimization [1.246870021158888]
This paper investigates the capabilities of large language models (LLMs) in formulating and solving decision-making problems using mathematical programming.<n>We first conduct a systematic review and meta-analysis of recent literature to assess how well LLMs understand, structure, and solve optimization problems across domains.<n>Our systematic evidence is complemented by targeted experiments designed to evaluate the performance of state-of-the-art LLMs in automatically generating optimization models for problems in computer networks.
arXiv Detail & Related papers (2025-08-25T14:52:56Z)
Autoformulation of Mathematical Optimization Models Using LLMs [50.030647274271516]
This paper approaches the problem of $textitautoformulation$: the automated creation of solver-ready optimization models from natural language problem descriptions.<n>We identify three core challenges of autoformulation: $textit(1)$ the vast, problem-dependent hypothesis space, and $textit(2)$ efficient and diverse exploration of this space under uncertainty.<n>We present a novel method leveraging $textitLarge Language Models$ with $textitMonte-Carlo Tree Search$, exploiting the hierarchical nature of optimization modeling to generate and systematically explore possible formulations
arXiv Detail & Related papers (2024-11-03T20:41:38Z)
Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System [75.25394449773052]
Large Language Model (LLM) based multi-agent systems (MAS) show remarkable potential in collaborative problem-solving.<n>Yet they still face critical challenges: low communication efficiency, poor scalability, and a lack of effective parameter-updating optimization methods.<n>We present Optima, a novel framework that addresses these issues by significantly enhancing both communication efficiency and task effectiveness.
arXiv Detail & Related papers (2024-10-10T17:00:06Z)
EVOLvE: Evaluating and Optimizing LLMs For In-Context Exploration [76.66831821738927]
Large language models (LLMs) remain under-studied in scenarios requiring optimal decision-making under uncertainty.<n>We measure LLMs' (in)ability to make optimal decisions in bandits, a state-less reinforcement learning setting relevant to many applications.<n>Motivated by the existence of optimal exploration algorithms, we propose efficient ways to integrate this algorithmic knowledge into LLMs.
arXiv Detail & Related papers (2024-10-08T17:54:03Z)
Self-supervised Preference Optimization: Enhance Your Language Model with Preference Degree Awareness [27.43137305486112]
We propose a novel Self-supervised Preference Optimization (SPO) framework, which constructs a self-supervised preference degree loss combined with the alignment loss. The results demonstrate that SPO can be seamlessly integrated with existing preference optimization methods to achieve state-of-the-art performance.
arXiv Detail & Related papers (2024-09-26T12:37:26Z)
Achieving Peak Performance for Large Language Models: A Systematic Review [0.0]
Large language models (LLMs) have achieved remarkable success in natural language processing (NLP) As models grow into the trillion- parameter range, computational and memory costs increase significantly. This makes it difficult for many researchers to access the resources needed to train or apply these models.
arXiv Detail & Related papers (2024-09-07T13:57:41Z)
Efficiency optimization of large-scale language models based on deep learning in natural language processing tasks [6.596361762662328]
Internal structure and operation mechanism of large-scale language models are analyzed theoretically. We evaluate the contribution of adaptive optimization algorithms (such as AdamW), massively parallel computing techniques, and mixed precision training strategies.
arXiv Detail & Related papers (2024-05-20T00:10:00Z)
A Thorough Examination of Decoding Methods in the Era of LLMs [72.65956436513241]
Decoding methods play an indispensable role in converting language models from next-token predictors into practical task solvers. This paper provides a comprehensive and multifaceted analysis of various decoding methods within the context of large language models. Our findings reveal that decoding method performance is notably task-dependent and influenced by factors such as alignment, model size, and quantization.
arXiv Detail & Related papers (2024-02-10T11:14:53Z)
Multiobjective Optimization Analysis for Finding Infrastructure-as-Code Deployment Configurations [0.3774866290142281]
This paper is focused on a multiobjective problem related to Infrastructure-as-Code deployment configurations. We resort in this paper to nine different evolutionary-based multiobjective algorithms. Results obtained by each method after 10 independent runs have been compared using Friedman's non-parametric tests.
arXiv Detail & Related papers (2024-01-18T13:55:32Z)
A Survey of Contextual Optimization Methods for Decision Making under Uncertainty [47.73071218563257]
This review article identifies three main frameworks for learning policies from data and discusses their strengths and limitations. We present the existing models and methods under a uniform notation and terminology and classify them according to the three main frameworks.
arXiv Detail & Related papers (2023-06-17T15:21:02Z)
Learning to Optimize: A Primer and A Benchmark [94.29436694770953]
Learning to optimize (L2O) is an emerging approach that leverages machine learning to develop optimization methods. This article is poised to be the first comprehensive survey and benchmark of L2O for continuous optimization.
arXiv Detail & Related papers (2021-03-23T20:46:20Z)
Comparative Code Structure Analysis using Deep Learning for Performance Prediction [18.226950022938954]
This paper aims to assess the feasibility of using purely static information (e.g., abstract syntax tree or AST) of applications to predict performance change based on the change in code structure. Our evaluations of several deep embedding learning methods demonstrate that tree-based Long Short-Term Memory (LSTM) models can leverage the hierarchical structure of source-code to discover latent representations and achieve up to 84% (individual problem) and 73% (combined dataset with multiple of problems) accuracy in predicting the change in performance.
arXiv Detail & Related papers (2021-02-12T16:59:12Z)
A Survey on Large-scale Machine Learning [67.6997613600942]
Machine learning can provide deep insights into data, allowing machines to make high-quality predictions. Most sophisticated machine learning approaches suffer from huge time costs when operating on large-scale data. Large-scale Machine Learning aims to learn patterns from big data with comparable performance efficiently.
arXiv Detail & Related papers (2020-08-10T06:07:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.