A Knapsack by Any Other Name: Presentation impacts LLM performance on NP-hard problems
- URL: http://arxiv.org/abs/2502.13776v2
- Date: Sat, 18 Oct 2025 22:20:27 GMT
- Title: A Knapsack by Any Other Name: Presentation impacts LLM performance on NP-hard problems
- Authors: Alex Duchnowski, Ellie Pavlick, Alexander Koller,
- Abstract summary: We introduce the dataset of Everyday Hard Optimization Problems (EHOP), a collection of NP-hard problems expressed in natural language.<n>EHOP includes problem formulations that could be found in computer science textbooks (e.g., graph coloring), versions that are dressed up as problems that could arise in real life.<n>We find that state-of-the-art LLMs, across multiple prompting strategies, solve textbook problems more accurately than their real-life and inverted counterparts.
- Score: 64.05451567422342
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: To investigate the effect of problem presentation on LLMs' ability to solve optimization problems, we introduce the dataset of Everyday Hard Optimization Problems (EHOP), a collection of NP-hard problems expressed in natural language. EHOP includes problem formulations that could be found in computer science textbooks (e.g., graph coloring), versions that are dressed up as problems that could arise in real life (e.g., party planning), and variants with inverted rules. We find that state-of-the-art LLMs, across multiple prompting strategies, systematically solve textbook problems more accurately than their real-life and inverted counterparts. While reasoning models are more capable, they nonetheless show high variance across problem presentations, suggesting they lack a truly robust reasoning mechanism. We argue that this constitutes evidence that LLMs are still heavily dependent on what was seen in training and struggle to generalize to novel problems.
Related papers
- Farther the Shift, Sparser the Representation: Analyzing OOD Mechanisms in LLMs [100.02824137397464]
We investigate how Large Language Models adapt their internal representations when encountering inputs of increasing difficulty.<n>We reveal a consistent and quantifiable phenomenon: as task difficulty increases, the last hidden states of LLMs become substantially sparser.<n>This sparsity--difficulty relation is observable across diverse models and domains.
arXiv Detail & Related papers (2026-03-03T18:48:15Z) - Probing the Difficulty Perception Mechanism of Large Language Models [31.945071671041465]
We investigate whether large language models implicitly encode problem difficulty in their internal representations.<n>We locate the specific attention heads of the final Transformer layer.<n>Experiments provide practical support for using LLMs as automatic difficulty annotators.
arXiv Detail & Related papers (2025-10-07T14:24:32Z) - Solving Situation Puzzles with Large Language Model and External Reformulation [6.793639595476304]
We show that large language models (LLMs) cannot perform well on reasoning that requires multiple rounds of dialogue.<n>We propose a novel external reformulation methodology, where the situation puzzle will be reformulated after several rounds of Q&A.<n>Experiments show superior performance (e.g., win rate, number of question/guess attempts) of our method than directly using LLMs for solving situation puzzles.
arXiv Detail & Related papers (2025-03-24T07:05:55Z) - Improving Math Problem Solving in Large Language Models Through Categorization and Strategy Tailoring [0.0]
We show how to leverage large language models (LLMs) to solve mathematical problems efficiently and accurately.<n>We design a simple yet intuitive machine learning model for problem categorization.<n>We find that the performance of this simple model approaches that of state-of-the-art (SOTA) models for categorization.
arXiv Detail & Related papers (2024-10-29T16:06:26Z) - BloomWise: Enhancing Problem-Solving capabilities of Large Language Models using Bloom's-Taxonomy-Inspired Prompts [59.83547898874152]
We introduce BloomWise, a new prompting technique, inspired by Bloom's taxonomy, to improve the performance of Large Language Models (LLMs)
The decision regarding the need to employ more sophisticated cognitive skills is based on self-evaluation performed by the LLM.
In extensive experiments across 4 popular math reasoning datasets, we have demonstrated the effectiveness of our proposed approach.
arXiv Detail & Related papers (2024-10-05T09:27:52Z) - Not All LLM Reasoners Are Created Equal [58.236453890457476]
We study the depth of grade-school math problem-solving capabilities of LLMs.
We evaluate their performance on pairs of existing math word problems together.
arXiv Detail & Related papers (2024-10-02T17:01:10Z) - MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data [20.31528845718877]
Large language models (LLMs) have significantly advanced natural language understanding and demonstrated strong problem-solving abilities.
This paper investigates the mathematical problem-solving capabilities of LLMs using the newly developed "MathOdyssey" dataset.
arXiv Detail & Related papers (2024-06-26T13:02:35Z) - Eliciting Problem Specifications via Large Language Models [4.055489363682198]
Large language models (LLMs) can be utilized to map a problem class into a semi-formal specification.
A cognitive system can then use the problem-space specification to solve multiple instances of problems from the problem class.
arXiv Detail & Related papers (2024-05-20T16:19:02Z) - FCoReBench: Can Large Language Models Solve Challenging First-Order Combinatorial Reasoning Problems? [25.352721856952655]
First-order reasoning problems can be instantiated with infinite number of problem instances of varying sizes.
We present FCoReBench, a dataset of 40 such challenging problems, along with scripts to generate problem instances of varying sizes and automatically verify and generate their solutions.
We propose SymPro-LM, which combines LLMs with both symbolic solvers and program interpreters, along with feedback from a few solved examples, to achieve huge performance gains.
arXiv Detail & Related papers (2024-02-04T20:56:09Z) - Attention-based Reinforcement Learning for Combinatorial Optimization: Application to Job Shop Scheduling Problem [2.024210754085351]
This study proposes an innovative attention-based reinforcement learning method specifically designed for the category of job shop scheduling problems.
A key finding of this research is the ability of our trained learners within the proposed method to be repurposed for larger-scale problems that were not part of the initial training set.
arXiv Detail & Related papers (2024-01-29T21:31:54Z) - Competition-Level Problems are Effective LLM Evaluators [121.15880285283116]
This paper aims to evaluate the reasoning capacities of large language models (LLMs) in solving recent programming problems in Codeforces.
We first provide a comprehensive evaluation of GPT-4's peiceived zero-shot performance on this task, considering various aspects such as problems' release time, difficulties, and types of errors encountered.
Surprisingly, theThoughtived performance of GPT-4 has experienced a cliff like decline in problems after September 2021 consistently across all the difficulties and types of problems.
arXiv Detail & Related papers (2023-12-04T18:58:57Z) - Thought Propagation: An Analogical Approach to Complex Reasoning with Large Language Models [62.96551299003463]
We propose textbftextitThought Propagation (TP) to enhance the complex reasoning ability of Large Language Models.
TP first prompts LLMs to propose and solve a set of analogous problems that are related to the input one.
TP reuses the results of analogous problems to directly yield a new solution or derive a knowledge-intensive plan for execution to amend the initial solution obtained from scratch.
arXiv Detail & Related papers (2023-10-06T01:40:09Z) - Language Models for Business Optimisation with a Real World Case Study in Production Scheduling [3.224702011999591]
Large Language Models (LLMs) have demonstrated outstanding performance across different language-related tasks.
We present an LLM-based framework for automating problem formulation in business optimisation.
arXiv Detail & Related papers (2023-09-22T23:45:21Z) - Large Language Model for Science: A Study on P vs. NP [88.67249044141529]
We use large language models (LLMs) to augment and accelerate research on the P versus NP problem.
Specifically, we propose Socratic reasoning, a general framework that promotes in-depth thinking with LLMs for complex problem-solving.
Our pilot study on the P vs. NP problem shows that GPT-4 successfully produces a proof schema and engages in rigorous reasoning throughout 97 dialogue turns.
arXiv Detail & Related papers (2023-09-11T17:49:27Z) - Automatically Correcting Large Language Models: Surveying the landscape
of diverse self-correction strategies [104.32199881187607]
Large language models (LLMs) have demonstrated remarkable performance across a wide array of NLP tasks.
A promising approach to rectify these flaws is self-correction, where the LLM itself is prompted or guided to fix problems in its own output.
This paper presents a comprehensive review of this emerging class of techniques.
arXiv Detail & Related papers (2023-08-06T18:38:52Z) - Learning Adaptive Evolutionary Computation for Solving Multi-Objective
Optimization Problems [3.3266268089678257]
This paper proposes a framework that integrates MOEAs with adaptive parameter control using Deep Reinforcement Learning (DRL)
The DRL policy is trained to adaptively set the values that dictate the intensity and probability of mutation for solutions during optimization.
We show the learned policy is transferable, i.e., the policy trained on a simple benchmark problem can be directly applied to solve the complex warehouse optimization problem.
arXiv Detail & Related papers (2022-11-01T22:08:34Z) - Learning Proximal Operators to Discover Multiple Optima [66.98045013486794]
We present an end-to-end method to learn the proximal operator across non-family problems.
We show that for weakly-ized objectives and under mild conditions, the method converges globally.
arXiv Detail & Related papers (2022-01-28T05:53:28Z) - Learning Hard Optimization Problems: A Data Generation Perspective [44.4747903763245]
This paper demonstrates the volatility of the training data to the ability to approximate it.
It proposes a method for producing (exact or approximate) solutions to optimization problems that are amenable to supervised tasks.
arXiv Detail & Related papers (2021-06-04T17:03:44Z) - Learning by Fixing: Solving Math Word Problems with Weak Supervision [70.62896781438694]
Previous neural solvers of math word problems (MWPs) are learned with full supervision and fail to generate diverse solutions.
We introduce a textitweakly-supervised paradigm for learning MWPs.
Our method only requires the annotations of the final answers and can generate various solutions for a single problem.
arXiv Detail & Related papers (2020-12-19T03:10:21Z) - Total Deep Variation for Linear Inverse Problems [71.90933869570914]
We propose a novel learnable general-purpose regularizer exploiting recent architectural design patterns from deep learning.
We show state-of-the-art performance for classical image restoration and medical image reconstruction problems.
arXiv Detail & Related papers (2020-01-14T19:01:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.