LM4OPT: Unveiling the Potential of Large Language Models in Formulating
Mathematical Optimization Problems
- URL: http://arxiv.org/abs/2403.01342v1
- Date: Sat, 2 Mar 2024 23:32:33 GMT
- Title: LM4OPT: Unveiling the Potential of Large Language Models in Formulating
Mathematical Optimization Problems
- Authors: Tasnim Ahmed, Salimur Choudhury
- Abstract summary: This study compares prominent Large Language Models, including GPT-3.5, GPT-4, and Llama-2-7b, in zero-shot and one-shot settings.
Our findings show GPT-4's superior performance, particularly in the one-shot scenario.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the rapidly evolving field of natural language processing, the translation
of linguistic descriptions into mathematical formulation of optimization
problems presents a formidable challenge, demanding intricate understanding and
processing capabilities from Large Language Models (LLMs). This study compares
prominent LLMs, including GPT-3.5, GPT-4, and Llama-2-7b, in zero-shot and
one-shot settings for this task. Our findings show GPT-4's superior
performance, particularly in the one-shot scenario. A central part of this
research is the introduction of `LM4OPT,' a progressive fine-tuning framework
for Llama-2-7b that utilizes noisy embeddings and specialized datasets.
However, this research highlights a notable gap in the contextual understanding
capabilities of smaller models such as Llama-2-7b compared to larger
counterparts, especially in processing lengthy and complex input contexts. Our
empirical investigation, utilizing the NL4Opt dataset, unveils that GPT-4
surpasses the baseline performance established by previous research, achieving
an F1-score of 0.63, solely based on the problem description in natural
language, and without relying on any additional named entity information.
GPT-3.5 follows closely, both outperforming the fine-tuned Llama-2-7b. These
findings not only benchmark the current capabilities of LLMs in a novel
application area but also lay the groundwork for future improvements in
mathematical formulation of optimization problems from natural language input.
Related papers
- An Empirical Study on Information Extraction using Large Language Models [36.090082785047855]
Human-like large language models (LLMs) have proven to be very helpful for many natural language processing (NLP) related tasks.
We propose and analyze the effects of a series of simple prompt-based methods on GPT-4's information extraction ability.
arXiv Detail & Related papers (2024-08-31T07:10:16Z) - MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time [51.5039731721706]
MindStar is a purely inference-based searching method for large language models.
It formulates reasoning tasks as searching problems and proposes two search ideas to identify the optimal reasoning paths.
It significantly enhances the reasoning abilities of open-source models, such as Llama-2-13B and Mistral-7B, and achieves comparable performance to GPT-3.5 and Grok-1.
arXiv Detail & Related papers (2024-05-25T15:07:33Z) - Advancing LLM Reasoning Generalists with Preference Trees [119.57169648859707]
We introduce Eurus, a suite of large language models (LLMs) optimized for reasoning.
Eurus models achieve state-of-the-art results among open-source models on a diverse set of benchmarks.
arXiv Detail & Related papers (2024-04-02T16:25:30Z) - TriSum: Learning Summarization Ability from Large Language Models with Structured Rationale [66.01943465390548]
We introduce TriSum, a framework for distilling large language models' text summarization abilities into a compact, local model.
Our method enhances local model performance on various benchmarks.
It also improves interpretability by providing insights into the summarization rationale.
arXiv Detail & Related papers (2024-03-15T14:36:38Z) - Benchmarking GPT-4 on Algorithmic Problems: A Systematic Evaluation of Prompting Strategies [47.129504708849446]
Large Language Models (LLMs) have revolutionized the field of Natural Language Processing.
LLMs lack systematic generalization, which allows to extrapolate the learned statistical regularities outside the training distribution.
In this work, we offer a systematic benchmarking of GPT-4, one of the most advanced LLMs available.
arXiv Detail & Related papers (2024-02-27T10:44:52Z) - Using Natural Language Explanations to Improve Robustness of In-context Learning [35.18010811754959]
Large language models (LLMs) can excel in many tasks via in-context learning (ICL)
We investigate whether augmenting ICL with natural language explanations (NLEs) improves the robustness of LLMs on adversarial datasets.
arXiv Detail & Related papers (2023-11-13T18:49:13Z) - SCALE: Synergized Collaboration of Asymmetric Language Translation
Engines [105.8983433641208]
We introduce a collaborative framework that connects compact Specialized Translation Models (STMs) and general-purpose Large Language Models (LLMs) as one unified translation engine.
By introducing translation from STM into the triplet in-context demonstrations, SCALE unlocks refinement and pivoting ability of LLM.
Our experiments show that SCALE significantly outperforms both few-shot LLMs (GPT-4) and specialized models (NLLB) in challenging low-resource settings.
arXiv Detail & Related papers (2023-09-29T08:46:38Z) - Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data? [49.688233418425995]
Struc-Bench is a comprehensive benchmark featuring prominent Large Language Models (LLMs)
We propose two innovative metrics, P-Score (Prompting Score) and H-Score (Heuristical Score)
Our experiments show that applying our structure-aware fine-tuning to LLaMA-7B leads to substantial performance gains.
arXiv Detail & Related papers (2023-09-16T11:31:58Z) - Large Language Models as Data Preprocessors [9.99065004972981]
Large Language Models (LLMs) have marked a significant advancement in artificial intelligence.
This study explores their potential in data preprocessing, a critical stage in data mining and analytics applications.
We propose an LLM-based framework for data preprocessing, which integrates cutting-edge prompt engineering techniques.
arXiv Detail & Related papers (2023-08-30T23:28:43Z) - Taqyim: Evaluating Arabic NLP Tasks Using ChatGPT Models [6.145834902689888]
Large language models (LLMs) have demonstrated impressive performance on various downstream tasks without requiring fine-tuning.
Despite having a lower training proportion compared to English, these models also exhibit remarkable capabilities in other languages.
In this study, we assess the performance of GPT-3.5 and GPT-4 models on seven distinct Arabic NLP tasks.
arXiv Detail & Related papers (2023-06-28T15:54:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.