PEA: Enhancing LLM Performance on Computational-Reasoning Tasks
- URL: http://arxiv.org/abs/2502.10938v1
- Date: Sun, 16 Feb 2025 00:27:05 GMT
- Title: PEA: Enhancing LLM Performance on Computational-Reasoning Tasks
- Authors: Zi Wang, Shiwei Weng, Mohannad Alhanahnah, Somesh Jha, Tom Reps,
- Abstract summary: This study introduces a formal approach to describe and solve a class of important reasoning tasks termed computational reasoning problems.<n>The framework decomposes these problems into predicate and enumeration components, using LLMs to synthesize programs based on specified predicates, enumeration, and aggregation rules.<n> Empirical evaluation reveals that PEA substantially enhances the performance of underlying models on benchmark computational problems, yielding an average accuracy improvement of approximately $50%$, coupled with increased efficiency.
- Score: 21.13926189404758
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) have exhibited remarkable capabilities across diverse domains, prompting investigations into their potential as generic reasoning engines. While recent studies have explored inference-time computation to enhance model performance on complex problems, current research lacks a formal framework to characterize the complexity of reasoning tasks. This study introduces the Predicate-Enumeration-Aggregation (PEA) framework, a formal approach to describe and solve a class of important reasoning tasks termed computational reasoning problems. The PEA framework decomposes these problems into predicate and enumeration components, using LLMs to synthesize programs based on specified predicates, enumeration, and aggregation rules. These synthesized programs are then executed to obtain solutions to the computational tasks. We demonstrate the framework's efficacy on benchmark tasks including Boolean satisfiability problems, game of $24$, and planning problems. Empirical evaluation reveals that PEA substantially enhances the performance of underlying models on benchmark computational problems, yielding an average accuracy improvement of approximately $50\%$, coupled with increased efficiency.
Related papers
- Evaluating Multi-Hop Reasoning in Large Language Models: A Chemistry-Centric Case Study [0.9424565541639368]
We introduce a new benchmark consisting of a curated dataset and a defined evaluation process to assess the compositional reasoning capabilities of large language models within the chemistry domain.
Our approach integrates OpenAI reasoning models with named entity recognition (NER) systems to extract chemical entities from recent literature, which are then augmented with external knowledge bases to form a knowledge graph.
Our experiments reveal that even state-of-the-art models face significant challenges in multi-hop compositional reasoning.
arXiv Detail & Related papers (2025-04-23T04:36:19Z) - Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models [54.04678363287392]
Large Language Models (LLMs) have demonstrated remarkable capabilities in complex tasks.
Recent advancements in OpenAI o1 and DeepSeek-R1 have further improved performance in System-2 reasoning domains.
arXiv Detail & Related papers (2025-03-20T17:59:38Z) - Towards Efficient Educational Chatbots: Benchmarking RAG Frameworks [2.362412515574206]
Large Language Models (LLMs) have proven immensely beneficial in education by capturing vast amounts of literature-based information.
We propose a generative AI-powered GATE question-answering framework that leverages LLMs to explain GATE solutions and support students in their exam preparation.
arXiv Detail & Related papers (2025-03-02T08:11:07Z) - Inference-Time Computations for LLM Reasoning and Planning: A Benchmark and Insights [49.42133807824413]
We examine the reasoning and planning capabilities of large language models (LLMs) in solving complex tasks.
Recent advances in inference-time techniques demonstrate the potential to enhance LLM reasoning without additional training.
OpenAI's o1 model shows promising performance through its novel use of multi-step reasoning and verification.
arXiv Detail & Related papers (2025-02-18T04:11:29Z) - REL: Working out is all you need [20.65423513616306]
We observe that OpenAI's O1 model approaches problem-solving in a distinctly human-like manner.<n>These sophisticated reasoning capabilities remain notably absent in other state-of-the-art language models.
arXiv Detail & Related papers (2024-12-05T22:32:01Z) - Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization [49.362750475706235]
Reinforcement Learning (RL) plays a crucial role in aligning large language models with human preferences and improving their ability to perform complex tasks.<n>We introduce Direct Q-function Optimization (DQO), which formulates the response generation process as a Markov Decision Process (MDP) and utilizes the soft actor-critic (SAC) framework to optimize a Q-function directly parameterized by the language model.<n> Experimental results on two math problem-solving datasets, GSM8K and MATH, demonstrate that DQO outperforms previous methods, establishing it as a promising offline reinforcement learning approach for aligning language models.
arXiv Detail & Related papers (2024-10-11T23:29:20Z) - Think Beyond Size: Adaptive Prompting for More Effective Reasoning [0.0]
We introduce Adaptive Prompting, a dynamic and iterative framework designed to enhance reasoning by incorporating real-time adjustments to prompt structures and validation mechanisms.<n>Results demonstrate that Adaptive Prompting significantly improves performance on diverse reasoning benchmarks, including arithmetic reasoning (GSM8K, MultiArithm), logical reasoning and commonsense tasks.<n>Our approach enables smaller models to achieve competitive performance with larger counterparts, such as GPT-4, while maintaining computational efficiency.
arXiv Detail & Related papers (2024-10-10T17:14:36Z) - Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification [52.095460362197336]
Large language models (LLMs) struggle with consistent and accurate reasoning.
LLMs are trained primarily on correct solutions, reducing their ability to detect and learn from errors.
We propose a novel collaborative method integrating Chain-of-Thought (CoT) and Program-of-Thought (PoT) solutions for verification.
arXiv Detail & Related papers (2024-10-05T05:21:48Z) - Key-Point-Driven Mathematical Reasoning Distillation of Large Language Model [15.542737858152053]
We propose Key-Point-Driven Mathematical Reasoning Distillation (KPDD) to mitigate misunderstanding errors.
KPDD enhances the reasoning performance of SLMs by breaking down the problem-solving process into three stages.
Experiments show KPDD-CoT significantly improves reasoning abilities, while KPDD-PoT achieves state-of-the-art performance in mathematical reasoning tasks.
arXiv Detail & Related papers (2024-07-14T11:41:03Z) - Token-Supervised Value Models for Enhancing Mathematical Problem-Solving Capabilities of Large Language Models [56.32800938317095]
Existing verifiers are sub-optimal for tree search techniques at test time.
We propose token-supervised value models (TVMs)
TVMs assign each token a probability that reflects the likelihood of reaching the correct final answer.
arXiv Detail & Related papers (2024-07-12T13:16:50Z) - MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time [51.5039731721706]
MindStar is a purely inference-based searching method for large language models.
It formulates reasoning tasks as searching problems and proposes two search ideas to identify the optimal reasoning paths.
It significantly enhances the reasoning abilities of open-source models, such as Llama-2-13B and Mistral-7B, and achieves comparable performance to GPT-3.5 and Grok-1.
arXiv Detail & Related papers (2024-05-25T15:07:33Z) - Evaluating LLMs' Mathematical and Coding Competency through Ontology-guided Interventions [47.83142414018448]
We focus on two popular reasoning tasks: arithmetic reasoning and code generation.
We introduce (i) a general ontology of perturbations for math and coding questions, (ii) a semi-automatic method to apply these perturbations, and (iii) two datasets.
We show a significant performance drop across all the models against perturbed questions.
arXiv Detail & Related papers (2024-01-17T18:13:07Z) - Exploring Viable Algorithmic Options for Learning from Demonstration
(LfD): A Parameterized Complexity Approach [0.0]
In this paper, we show how such a systematic exploration of algorithmic options can be done using parameterized complexity analysis.
We show that none of our problems can be solved efficiently either in general or relative to a number of (often simultaneous) restrictions on environments, demonstrations, and policies.
arXiv Detail & Related papers (2022-05-10T15:54:06Z) - An Information-Theoretic Framework for Unifying Active Learning Problems [44.758281991246825]
This paper presents an information-theoretic framework for unifying active learning problems.
We first introduce a novel active learning criterion that subsumes an existing LSE algorithm.
By exploiting the relationship between LSE and BO, we design a competitive information-theoretic acquisition function for BO.
arXiv Detail & Related papers (2020-12-19T14:22:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.