Enhancing Numerical Reasoning with the Guidance of Reliable Reasoning
Processes
- URL: http://arxiv.org/abs/2402.10654v1
- Date: Fri, 16 Feb 2024 13:02:11 GMT
- Title: Enhancing Numerical Reasoning with the Guidance of Reliable Reasoning
Processes
- Authors: Dingzirui Wang, Longxu Dou, Xuanliang Zhang, Qingfu Zhu, Wanxiang Che
- Abstract summary: We introduce Enhancing NumeriCal reasOning with Reliable procEsses (Encore), which derives the reliable reasoning process by decomposing the answer formula.
We present a series of pre-training tasks to help models learn the reasoning process generation with synthesized data.
Experiments show that Encore yields improvement on all five experimental datasets with an average of 1.8%.
- Score: 55.2326738851157
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Numerical reasoning is an essential ability for NLP systems to handle numeric
information. Recent research indicates that fine-tuning a small-scale model to
learn generating reasoning processes alongside answers can significantly
enhance performance. However, current methods have the limitation that most
methods generate reasoning processes with large language models (LLMs), which
are "unreliable" since such processes could contain information unrelated to
the answer. To address this limitation, we introduce Enhancing NumeriCal
reasOning with Reliable procEsses (Encore), which derives the reliable
reasoning process by decomposing the answer formula, ensuring which fully
supports the answer. Nevertheless, models could lack enough data to learn the
reasoning process generation adequately, since our method generates only one
single reasoning process for one formula. To overcome this difficulty, we
present a series of pre-training tasks to help models learn the reasoning
process generation with synthesized data. The experiments show that Encore
yields improvement on all five experimental datasets with an average of 1.8%,
proving the effectiveness of our method.
Related papers
- A Comparative Study on Reasoning Patterns of OpenAI's o1 Model [69.08287909042421]
We show that OpenAI's o1 model has achieved the best performance on most datasets.
We also provide a detailed analysis on several reasoning benchmarks.
arXiv Detail & Related papers (2024-10-17T15:09:03Z) - General Purpose Verification for Chain of Thought Prompting [16.381123651223763]
We explore ways to improve reasoning capabilities of Large Language Models (LLMs)
We propose three general principles that a model should adhere to while reasoning.
We apply these constraints to the reasoning steps generated by the LLM to improve the accuracy of the final generation.
arXiv Detail & Related papers (2024-04-30T21:15:17Z) - R-Tuning: Instructing Large Language Models to Say `I Don't Know' [66.11375475253007]
Large language models (LLMs) have revolutionized numerous domains with their impressive performance but still face their challenges.
Previous instruction tuning methods force the model to complete a sentence no matter whether the model knows the knowledge or not.
We present a new approach called Refusal-Aware Instruction Tuning (R-Tuning)
Experimental results demonstrate R-Tuning effectively improves a model's ability to answer known questions and refrain from answering unknown questions.
arXiv Detail & Related papers (2023-11-16T08:45:44Z) - Sci-CoT: Leveraging Large Language Models for Enhanced Knowledge
Distillation in Small Models for Scientific QA [5.117094291273979]
Large Language Models (LLMs) have shown outstanding performance across wide range of downstream tasks.
We propose Sci-CoT, a two-stage framework that separates the processes of generating rationales and inferring answers.
Our 80-million parameter model is able to exceed the performance of BLOOM-176B in the ARC-Easy dataset under the few shot setting.
arXiv Detail & Related papers (2023-08-09T03:18:07Z) - BOOT: Data-free Distillation of Denoising Diffusion Models with
Bootstrapping [64.54271680071373]
Diffusion models have demonstrated excellent potential for generating diverse images.
Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few.
We present a novel technique called BOOT, that overcomes limitations with an efficient data-free distillation algorithm.
arXiv Detail & Related papers (2023-06-08T20:30:55Z) - Self-Polish: Enhance Reasoning in Large Language Models via Problem Refinement [50.62461749446111]
Self-Polish (SP) is a novel method that facilitates the model's reasoning by guiding it to progressively refine the given problems to be more comprehensible and solvable.
SP is to all other prompting methods of answer/reasoning side like CoT, allowing for seamless integration with state-of-the-art techniques for further improvement.
arXiv Detail & Related papers (2023-05-23T19:58:30Z) - Logic-Guided Data Augmentation and Regularization for Consistent
Question Answering [55.05667583529711]
This paper addresses the problem of improving the accuracy and consistency of responses to comparison questions.
Our method leverages logical and linguistic knowledge to augment labeled training data and then uses a consistency-based regularizer to train the model.
arXiv Detail & Related papers (2020-04-21T17:03:08Z) - An Hybrid Method for the Estimation of the Breast Mechanical Parameters [0.9176056742068814]
An accurate numerical breast model can provide assistance to surgeons with visual information of the breast as a result of a surgery simulation.
The process of finding the model parameters requires numeric inputs, either based in medical imaging techniques, or other measures.
Inverse elasticity solvers are highly robust and provide solutions within the required degree of accuracy.
Deep-learning methods, such as neural networks, can provide accurate results in the majority of cases.
arXiv Detail & Related papers (2020-03-09T11:21:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.