LPML: LLM-Prompting Markup Language for Mathematical Reasoning
- URL: http://arxiv.org/abs/2309.13078v2
- Date: Wed, 11 Oct 2023 12:21:30 GMT
- Title: LPML: LLM-Prompting Markup Language for Mathematical Reasoning
- Authors: Ryutaro Yamauchi, Sho Sonoda, Akiyoshi Sannai, Wataru Kumagai
- Abstract summary: We propose a novel framework that integrates the Chain-of-Thought (CoT) method with an external tool (Python REPL)
Our approach enables LLMs to write the markup language and perform advanced mathematical reasoning using only zero-shot prompting.
- Score: 8.995617701116142
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In utilizing large language models (LLMs) for mathematical reasoning,
addressing the errors in the reasoning and calculation present in the generated
text by LLMs is a crucial challenge. In this paper, we propose a novel
framework that integrates the Chain-of-Thought (CoT) method with an external
tool (Python REPL). We discovered that by prompting LLMs to generate structured
text in XML-like markup language, we could seamlessly integrate CoT and the
external tool and control the undesired behaviors of LLMs. With our approach,
LLMs can utilize Python computation to rectify errors within CoT. We applied
our method to ChatGPT (GPT-3.5) to solve challenging mathematical problems and
demonstrated that combining CoT and Python REPL through the markup language
enhances the reasoning capability of LLMs. Our approach enables LLMs to write
the markup language and perform advanced mathematical reasoning using only
zero-shot prompting.
Related papers
- zsLLMCode: An Effective Approach for Functional Code Embedding via LLM with Zero-Shot Learning [6.976968804436321]
Large language models (LLMs) have the capability of zero-shot learning, which does not require training or fine-tuning.
We propose zsLLMCode, a novel approach that generates functional code embeddings using LLMs.
arXiv Detail & Related papers (2024-09-23T01:03:15Z) - Arithmetic Reasoning with LLM: Prolog Generation & Permutation [2.1867261071129125]
We show that Prolog-based arithmetic problem-solving outperforms CoT generation in the GSM8K benchmark.
We propose to permute the ground truth predicates for more robust LLM training via data augmentation.
arXiv Detail & Related papers (2024-05-28T07:13:25Z) - Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing [56.75702900542643]
We introduce AlphaLLM for the self-improvements of Large Language Models.
It integrates Monte Carlo Tree Search (MCTS) with LLMs to establish a self-improving loop.
Our experimental results show that AlphaLLM significantly enhances the performance of LLMs without additional annotations.
arXiv Detail & Related papers (2024-04-18T15:21:34Z) - ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline [42.61538071832468]
Large language models (LLMs) have shown excellent mastering of human language, but still struggle in real-world applications that require mathematical problem-solving.
We tailor the Self-Critique pipeline, which addresses the challenge in the feedback learning stage of LLM alignment.
arXiv Detail & Related papers (2024-04-03T17:51:18Z) - Which Syntactic Capabilities Are Statistically Learned by Masked
Language Models for Code? [51.29970742152668]
We highlight relying on accuracy-based measurements may lead to an overestimation of models' capabilities.
To address these issues, we introduce a technique called SyntaxEval in Syntactic Capabilities.
arXiv Detail & Related papers (2024-01-03T02:44:02Z) - Turbulence: Systematically and Automatically Testing Instruction-Tuned
Large Language Models for Code [12.58098809948832]
We present a method for evaluating the correctness and robustness of instruction-tuned large language models (LLMs) for code generation via a new benchmark, Turbulence.
Turbulence consists of a large set of natural language $textitquestion templates$, each of which is a programming problem, parameterised so that it can be asked in many different forms.
From a single question template, it is possible to ask an LLM a $textitneighbourhood$ of very similar programming questions, and assess the correctness of the result returned for each question.
arXiv Detail & Related papers (2023-12-22T17:29:08Z) - LM-Polygraph: Uncertainty Estimation for Language Models [71.21409522341482]
Uncertainty estimation (UE) methods are one path to safer, more responsible, and more effective use of large language models (LLMs)
We introduce LM-Polygraph, a framework with implementations of a battery of state-of-the-art UE methods for LLMs in text generation tasks, with unified program interfaces in Python.
It introduces an extendable benchmark for consistent evaluation of UE techniques by researchers, and a demo web application that enriches the standard chat dialog with confidence scores.
arXiv Detail & Related papers (2023-11-13T15:08:59Z) - Logic-LM: Empowering Large Language Models with Symbolic Solvers for
Faithful Logical Reasoning [101.26814728062065]
Large Language Models (LLMs) have shown human-like reasoning abilities but still struggle with complex logical problems.
This paper introduces a novel framework, Logic-LM, which integrates LLMs with symbolic solvers to improve logical problem-solving.
arXiv Detail & Related papers (2023-05-20T22:25:38Z) - Check Your Facts and Try Again: Improving Large Language Models with
External Knowledge and Automated Feedback [127.75419038610455]
Large language models (LLMs) are able to generate human-like, fluent responses for many downstream tasks.
This paper proposes a LLM-Augmenter system, which augments a black-box LLM with a set of plug-and-play modules.
arXiv Detail & Related papers (2023-02-24T18:48:43Z) - PAL: Program-aided Language Models [112.94785609781503]
We present Program-Aided Language models (PaL) to understand natural language problems.
PaL offloads the solution step to a programmatic runtime such as a Python interpreter.
We set new state-of-the-art results in all 12 benchmarks.
arXiv Detail & Related papers (2022-11-18T18:56:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.