Related papers: Exploring Combinatorial Problem Solving with Large Language Models: A Case Study on the Travelling Salesman Problem Using GPT-3.5 Turbo

Exploring Combinatorial Problem Solving with Large Language Models: A Case Study on the Travelling Salesman Problem Using GPT-3.5 Turbo

URL: http://arxiv.org/abs/2405.01997v1
Date: Fri, 3 May 2024 10:54:14 GMT
Title: Exploring Combinatorial Problem Solving with Large Language Models: A Case Study on the Travelling Salesman Problem Using GPT-3.5 Turbo
Authors: Mahmoud Masoud, Ahmed Abdelhay, Mohammed Elhenawy,
Abstract summary: We investigate the potential of Large Language Models (LLMs) to solve the Travelling Salesman Problem (TSP) We fine-tuned GPT-3.5 Turbo to solve a specific problem size and tested it using a set of various instance sizes. The fine-tuned models demonstrated promising performance on problems identical in size to the training instances and generalized well to larger problems.
Score: 4.543552585804991
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) are deep learning models designed to generate text based on textual input. Although researchers have been developing these models for more complex tasks such as code generation and general reasoning, few efforts have explored how LLMs can be applied to combinatorial problems. In this research, we investigate the potential of LLMs to solve the Travelling Salesman Problem (TSP). Utilizing GPT-3.5 Turbo, we conducted experiments employing various approaches, including zero-shot in-context learning, few-shot in-context learning, and chain-of-thoughts (CoT). Consequently, we fine-tuned GPT-3.5 Turbo to solve a specific problem size and tested it using a set of various instance sizes. The fine-tuned models demonstrated promising performance on problems identical in size to the training instances and generalized well to larger problems. Furthermore, to improve the performance of the fine-tuned model without incurring additional training costs, we adopted a self-ensemble approach to improve the quality of the solutions.

Related papers

Analyzing Prominent LLMs: An Empirical Study of Performance and Complexity in Solving LeetCode Problems [0.0]
Large Language Models (LLMs) like ChatGPT, Copilot, Gemini, and DeepSeek are transforming software engineering by automating key tasks.<n>This study benchmarks these four prominent LLMs on one hundred and fifty LeetCode problems across easy, medium, and hard difficulties.<n>We evaluate each model based on execution time, memory usage, and algorithmic complexity, revealing significant performance differences.
arXiv Detail & Related papers (2025-08-05T21:50:52Z)
Teaching LLM to Reason: Reinforcement Learning from Algorithmic Problems without Code [76.80306464249217]
We propose TeaR, which aims at teaching LLMs to reason better.<n>TeaR leverages careful data curation and reinforcement learning to guide models in discovering optimal reasoning paths through code-related tasks.<n>We conduct extensive experiments using two base models and three long-CoT distillation models, with model sizes ranging from 1.5 billion to 32 billion parameters, and across 17 benchmarks spanning Math, Knowledge, Code, and Logical Reasoning.
arXiv Detail & Related papers (2025-07-10T07:34:05Z)
Narrowing the Gap: Supervised Fine-Tuning of Open-Source LLMs as a Viable Alternative to Proprietary Models for Pedagogical Tools [42.84219003918423]
This work demonstrates that smaller, specialised language models, enhanced via Supervised Fine-Tuning (SFT), present a more viable alternative for educational tools.<n>We utilise a new dataset of 40,000 C compiler error explanations, derived from real introductory programming (CS1/2) student-generated programming errors.<n>Our results show that SFT significantly boosts the pedagogical quality of smaller models, achieving performance comparable to much larger models.
arXiv Detail & Related papers (2025-07-07T08:03:49Z)
Generative Modeling for Mathematical Discovery [0.19791587637442667]
We present a new implementation of the genetic algorithm it funsearch. Our aim is to generate examples of interest to mathematicians. It does not require expertise in machine learning or access to high-performance computing resources.
arXiv Detail & Related papers (2025-03-14T03:54:43Z)
LLM-ProS: Analyzing Large Language Models' Performance in Competitive Problem Solving [1.5106583432923495]
This paper introduces a novel evaluation technique, LLM-ProS, to assess the performance of state-of-the-art LLMs. Using a curated dataset of 166 World Finals problems from 2011 to 2024, we benchmark the models' reasoning, accuracy, and efficiency. Our results reveal significant differences in the models' abilities to generalize, adapt, and solve novel problems.
arXiv Detail & Related papers (2025-02-04T18:55:14Z)
What Makes Large Language Models Reason in (Multi-Turn) Code Generation? [28.614888506962988]
Chain-of-thought has established itself as a popular vehicle for improving the outputs of large language models (LLMs) We investigate the effects of a wide range of prompting strategies with a focus on automatic re-prompting over multiple turns and computational requirements. Our study reveals strategies that consistently improve performance across all models with small and large sampling budgets.
arXiv Detail & Related papers (2024-10-10T16:53:10Z)
Reasoning Paths Optimization: Learning to Reason and Explore From Diverse Paths [69.39559168050923]
We introduce Reasoning Paths Optimization (RPO), which enables learning to reason and explore from diverse paths. Our approach encourages favorable branches at each reasoning step while penalizing unfavorable ones, enhancing the model's overall problem-solving performance. We focus on multi-step reasoning tasks, such as math word problems and science-based exam questions.
arXiv Detail & Related papers (2024-10-07T06:37:25Z)
SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models [54.78329741186446]
We propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation. Experiments across both in-domain and out-of-domain benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.
arXiv Detail & Related papers (2024-08-28T06:33:03Z)
Benchmarking Large Language Models for Math Reasoning Tasks [12.91916443702145]
We compare seven state-of-the-art in-context learning algorithms for mathematical problem solving across five widely used mathematical datasets on four powerful foundation models. Our results indicate that larger foundation models like GPT-4o and LLaMA 3-70B can solve mathematical reasoning independently from the concrete prompting strategy. We open-source our benchmark code to support the integration of additional models in future research.
arXiv Detail & Related papers (2024-08-20T13:34:17Z)
Navigating the Labyrinth: Evaluating and Enhancing LLMs' Ability to Reason About Search Problems [59.72548591120689]
We introduce a new benchmark, SearchBench, containing 11 unique search problem types. We show that even the most advanced LLMs fail to solve these problems end-to-end in text. Instructing LLMs to generate code that solves the problem helps, but only slightly, e.g., GPT4's performance rises to 11.7%.
arXiv Detail & Related papers (2024-06-18T00:44:58Z)
Improving the Capabilities of Large Language Model Based Marketing Analytics Copilots With Semantic Search And Fine-Tuning [0.9787137564521711]
We show how a combination of semantic search, prompt engineering, and fine-tuning can be applied to dramatically improve the ability of LLMs to execute these tasks accurately. We compare both proprietary models, like GPT-4, and open-source models, like Llama-2-70b, as well as various embedding methods.
arXiv Detail & Related papers (2024-04-16T03:39:16Z)
Enhancing the General Agent Capabilities of Low-Parameter LLMs through Tuning and Multi-Branch Reasoning [56.82041895921434]
Open-source pre-trained Large Language Models (LLMs) exhibit strong language understanding and generation capabilities. When used as agents for dealing with complex problems in the real world, their performance is far inferior to large commercial models such as ChatGPT and GPT-4.
arXiv Detail & Related papers (2024-03-29T03:48:12Z)
Limits of Transformer Language Models on Learning to Compose Algorithms [77.2443883991608]
We evaluate training LLaMA models and prompting GPT-4 and Gemini on four tasks demanding to learn a composition of several discrete sub-tasks. Our results indicate that compositional learning in state-of-the-art Transformer language models is highly sample inefficient.
arXiv Detail & Related papers (2024-02-08T16:23:29Z)
MT-Eval: A Multi-Turn Capabilities Evaluation Benchmark for Large Language Models [70.92847554971065]
We introduce MT-Eval, a comprehensive benchmark designed to evaluate multi-turn conversational abilities. By analyzing human-LLM conversations, we categorize interaction patterns into four types: recollection, expansion, refinement, and follow-up. Our evaluation of 11 well-known LLMs shows that while closed-source models generally surpass open-source ones, certain open-source models exceed GPT-3.5-Turbo in specific tasks.
arXiv Detail & Related papers (2024-01-30T04:50:28Z)
Self-Labeling the Job Shop Scheduling Problem [15.723699332053558]
We show that generative models can be trained by sampling multiple solutions and using the best one according to the problem objective as a pseudo-label. We prove the robustness of SLIM to various parameters and its generality by applying it to the Traveling Salesman Problem.
arXiv Detail & Related papers (2024-01-22T11:08:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.