Related papers: HYBRIDMIND: Meta Selection of Natural Language and Symbolic Language for Enhanced LLM Reasoning

HYBRIDMIND: Meta Selection of Natural Language and Symbolic Language for Enhanced LLM Reasoning

URL: http://arxiv.org/abs/2409.19381v5
Date: Wed, 26 Feb 2025 03:24:57 GMT
Title: HYBRIDMIND: Meta Selection of Natural Language and Symbolic Language for Enhanced LLM Reasoning
Authors: Simeng Han, Tianyu Liu, Chuhan Li, Xuyuan Xiong, Arman Cohan,
Abstract summary: We introduce HYBRIDMIND, an adaptive strategy that selects the optimal reasoning approach for each reasoning problem.<n>We find that fine-tuning LLaMA-3.1-8B-Instruct as a meta-selector outperforms GPT-4o's natural language reasoning.
Score: 25.192089674713365
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: LLMs approach logical and mathematical reasoning through natural or symbolic languages. While natural language offers human-accessible flexibility but suffers from ambiguity, symbolic reasoning provides precise, machine-executable inferences at the cost of strict domain constraints. We introduce HYBRIDMIND, an adaptive strategy that selects the optimal reasoning approach for each reasoning problem. Through extensive experiments, we evaluate both prompting-based approaches with state-of-the-art LLMs and fine-tuned open-source models. We find that fine-tuning LLaMA-3.1-8B-Instruct as a meta-selector outperforms GPT-4o's natural language reasoning by 4.4\% on FOLIO and 1.3\% on MATH. More notably, using GPT-3.5-turbo as a prompted meta-selector yields a 10\% improvement on FOLIO's challenging subset compared to GPT-4o. We will release our code and data to support future research.

Related papers

PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts [79.84059473102778]
PolyMath is a multilingual mathematical reasoning benchmark covering 18 languages and 4 easy-to-hard difficulty levels. Our benchmark ensures difficulty comprehensiveness, language diversity, and high-quality translation.
arXiv Detail & Related papers (2025-04-25T15:39:04Z)
OptimAI: Optimization from Natural Language Using LLM-Powered AI Agents [8.441638148384389]
We introduce textbfOptimAI, a framework for solving underlineOptimization problems described in natural language. Our framework is built upon four key roles: (1) a emphformulator; (2) a emphplanner; and (3) a emphcoder and a emphcode critic. Our approach attains 88.1% accuracy on the NLP4LP dataset and 71.2% on the Optibench subset, reducing error rates by 58% and 50% respectively over prior best results.
arXiv Detail & Related papers (2025-04-23T17:45:05Z)
Making LLMs Reason? The Intermediate Language Problem in Neurosymbolic Approaches [49.567092222782435]
We introduce the intermediate language problem, which is the problem of choosing a suitable formal language representation for neurosymbolic approaches. We show a maximum difference in overall-accuracy of 53.20% and 49.26% in execution-accuracy. When using the GPT4o-mini LLM we beat the state-of-the-art in overall-accuracy on the ProntoQA dataset by 21.20% and by 50.50% on the ProofWriter dataset.
arXiv Detail & Related papers (2025-02-24T14:49:52Z)
Towards LLM-based optimization compilers. Can LLMs learn how to apply a single peephole optimization? Reasoning is all LLMs need! [0.9054540533394926]
We show that OpenAI GPT-o1, despite not being fine-tuned, outperforms the fine-tuned Llama2 and GPT-4o. Our findings indicate that this advantage is largely due to the chain-of-thought reasoning implemented in GPT-o1.
arXiv Detail & Related papers (2024-12-11T18:44:31Z)
GIVE: Structured Reasoning of Large Language Models with Knowledge Graph Inspired Veracity Extrapolation [108.2008975785364]
Graph Inspired Veracity Extrapolation (GIVE) is a novel reasoning method that merges parametric and non-parametric memories to improve accurate reasoning with minimal external input. GIVE guides the LLM agent to select the most pertinent expert data (observe), engage in query-specific divergent thinking (reflect), and then synthesize this information to produce the final output (speak)
arXiv Detail & Related papers (2024-10-11T03:05:06Z)
LOGIC-LM++: Multi-Step Refinement for Symbolic Formulations [1.024113475677323]
This paper proposes Logic-LM++, an improvement on Logic-LM. It uses the ability of LLMs to do pairwise comparisons, allowing the evaluation of the refinements suggested by the LLM.
arXiv Detail & Related papers (2024-06-22T12:50:41Z)
LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models [52.03659714625452]
Recently developed large language models (LLMs) have been shown to perform remarkably well on a wide range of language understanding tasks. But, can they really "reason" over the natural language? This question has been receiving significant research attention and many reasoning skills such as commonsense, numerical, and qualitative have been studied.
arXiv Detail & Related papers (2024-04-23T21:08:49Z)
GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem Solvers [68.77382332826167]
Large language models (LLMs) have achieved impressive performance across various mathematical reasoning benchmarks. One essential and frequently occurring evidence is that when the math questions are slightly changed, LLMs can behave incorrectly. This motivates us to evaluate the robustness of LLMs' math reasoning capability by testing a wide range of question variations.
arXiv Detail & Related papers (2024-02-29T15:26:14Z)
MAPO: Advancing Multilingual Reasoning through Multilingual Alignment-as-Preference Optimization [65.31411639849516]
We propose a Multilingual-Alignment-as-Preference Optimization framework (MAPO) to align the reasoning processes in other languages with the dominant language. Specifically, we harness an off-the-shelf translation model for the consistency between answers in non-dominant and dominant languages. Experiments show that MAPO stably achieves significant improvements in the multilingual reasoning of various models.
arXiv Detail & Related papers (2024-01-12T18:03:54Z)
How Proficient Are Large Language Models in Formal Languages? An In-Depth Insight for Knowledge Base Question Answering [52.86931192259096]
Knowledge Base Question Answering (KBQA) aims to answer natural language questions based on facts in knowledge bases. Recent works leverage the capabilities of large language models (LLMs) for logical form generation to improve performance.
arXiv Detail & Related papers (2024-01-11T09:27:50Z)
LogicAsker: Evaluating and Improving the Logical Reasoning Ability of Large Language Models [63.14196038655506]
We introduce LogicAsker, a novel approach for evaluating and enhancing the logical reasoning capabilities of large language models (LLMs) Our methodology reveals significant gaps in LLMs' learning of logical rules, with identified reasoning failures ranging from 29% to 90% across different models. We leverage these findings to construct targeted demonstration examples and fine-tune data, notably enhancing logical reasoning in models like GPT-4o by up to 5%.
arXiv Detail & Related papers (2024-01-01T13:53:53Z)
Language Models can be Logical Solvers [99.40649402395725]
We introduce LoGiPT, a novel language model that directly emulates the reasoning processes of logical solvers. LoGiPT is fine-tuned on a newly constructed instruction-tuning dataset derived from revealing and refining the invisible reasoning process of deductive solvers.
arXiv Detail & Related papers (2023-11-10T16:23:50Z)
LINC: A Neurosymbolic Approach for Logical Reasoning by Combining Language Models with First-Order Logic Provers [60.009969929857704]
Logical reasoning is an important task for artificial intelligence with potential impacts on science, mathematics, and society. In this work, we reformulating such tasks as modular neurosymbolic programming, which we call LINC. We observe significant performance gains on FOLIO and a balanced subset of ProofWriter for three different models in nearly all experimental conditions we evaluate.
arXiv Detail & Related papers (2023-10-23T17:58:40Z)
Leveraging Large Language Models to Generate Answer Set Programs [5.532477732693001]
Large language models (LLMs) have demonstrated exceptional performance in various natural language processing tasks. This paper proposes a neuro-symbolic method that combines the strengths of large language models and answer set programming.
arXiv Detail & Related papers (2023-07-15T03:40:55Z)
ChatABL: Abductive Learning via Natural Language Interaction with ChatGPT [72.83383437501577]
Large language models (LLMs) have recently demonstrated significant potential in mathematical abilities. LLMs currently have difficulty in bridging perception, language understanding and reasoning capabilities. This paper presents a novel method for integrating LLMs into the abductive learning framework.
arXiv Detail & Related papers (2023-04-21T16:23:47Z)
PAL: Program-aided Language Models [112.94785609781503]
We present Program-Aided Language models (PaL) to understand natural language problems. PaL offloads the solution step to a programmatic runtime such as a Python interpreter. We set new state-of-the-art results in all 12 benchmarks.
arXiv Detail & Related papers (2022-11-18T18:56:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.