Solving Quantitative Reasoning Problems with Language Models
- URL: http://arxiv.org/abs/2206.14858v2
- Date: Fri, 1 Jul 2022 02:15:12 GMT
- Title: Solving Quantitative Reasoning Problems with Language Models
- Authors: Aitor Lewkowycz, Anders Andreassen, David Dohan, Ethan Dyer, Henryk
Michalewski, Vinay Ramasesh, Ambrose Slone, Cem Anil, Imanol Schlag, Theo
Gutman-Solo, Yuhuai Wu, Behnam Neyshabur, Guy Gur-Ari, Vedant Misra
- Abstract summary: We introduce Minerva, a large language model pretrained on general natural language data and further trained on technical content.
The model achieves state-of-the-art performance on technical benchmarks without the use of external tools.
We also evaluate our model on over two hundred undergraduate-level problems in physics, biology, chemistry, economics, and other sciences.
- Score: 53.53969870599973
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Language models have achieved remarkable performance on a wide range of tasks
that require natural language understanding. Nevertheless, state-of-the-art
models have generally struggled with tasks that require quantitative reasoning,
such as solving mathematics, science, and engineering problems at the college
level. To help close this gap, we introduce Minerva, a large language model
pretrained on general natural language data and further trained on technical
content. The model achieves state-of-the-art performance on technical
benchmarks without the use of external tools. We also evaluate our model on
over two hundred undergraduate-level problems in physics, biology, chemistry,
economics, and other sciences that require quantitative reasoning, and find
that the model can correctly answer nearly a third of them.
Related papers
- Evaluation of OpenAI o1: Opportunities and Challenges of AGI [112.0812059747033]
o1-preview demonstrated remarkable capabilities, often achieving human-level or superior performance.
The model excelled in tasks requiring intricate reasoning and knowledge integration across various fields.
Overall results indicate significant progress towards artificial general intelligence.
arXiv Detail & Related papers (2024-09-27T06:57:00Z) - Neuro-symbolic Training for Reasoning over Spatial Language [17.901249830817882]
We propose training language models with neuro-symbolic techniques that can exploit the logical rules of reasoning as constraints.
We focus on a challenging problem of spatial reasoning over text.
arXiv Detail & Related papers (2024-06-19T20:47:36Z) - INSTRUCTEVAL: Towards Holistic Evaluation of Instruction-Tuned Large
Language Models [39.46610170563634]
INSTRUCTEVAL is a more comprehensive evaluation suite designed specifically for instruction-tuned large language models.
We take a holistic approach to analyze various factors affecting model performance, including the pretraining foundation, instruction-tuning data, and training methods.
Our findings reveal that the quality of instruction data is the most crucial factor in scaling model performance.
arXiv Detail & Related papers (2023-06-07T20:12:29Z) - Language Model Behavior: A Comprehensive Survey [5.663056267168211]
We discuss over 250 recent studies of English language model behavior before task-specific fine-tuning.
Despite dramatic increases in generated text quality as models scale to hundreds of billions of parameters, the models are still prone to unfactual responses, commonsense errors, memorized text, and social biases.
arXiv Detail & Related papers (2023-03-20T23:54:26Z) - Overcoming Barriers to Skill Injection in Language Modeling: Case Study
in Arithmetic [14.618731441943847]
We develop a novel framework that enables language models to be mathematically proficient while retaining their linguistic prowess.
Specifically, we offer information-theoretic interventions to overcome the catastrophic forgetting of linguistic skills that occurs while injecting non-linguistic skills into language models.
arXiv Detail & Related papers (2022-11-03T18:53:30Z) - Analyzing the Limits of Self-Supervision in Handling Bias in Language [52.26068057260399]
We evaluate how well language models capture the semantics of four tasks for bias: diagnosis, identification, extraction and rephrasing.
Our analyses indicate that language models are capable of performing these tasks to widely varying degrees across different bias dimensions, such as gender and political affiliation.
arXiv Detail & Related papers (2021-12-16T05:36:08Z) - Scaling Language Models: Methods, Analysis & Insights from Training
Gopher [83.98181046650664]
We present an analysis of Transformer-based language model performance across a wide range of model scales.
Gains from scale are largest in areas such as reading comprehension, fact-checking, and the identification of toxic language.
We discuss the application of language models to AI safety and the mitigation of downstream harms.
arXiv Detail & Related papers (2021-12-08T19:41:47Z) - Scientia Potentia Est -- On the Role of Knowledge in Computational
Argumentation [52.903665881174845]
We propose a pyramid of types of knowledge required in computational argumentation.
We briefly discuss the state of the art on the role and integration of these types in the field.
arXiv Detail & Related papers (2021-07-01T08:12:41Z) - SMART: A Situation Model for Algebra Story Problems via Attributed
Grammar [74.1315776256292]
We introduce the concept of a emphsituation model, which originates from psychology studies to represent the mental states of humans in problem-solving.
We show that the proposed model outperforms all previous neural solvers by a large margin while preserving much better interpretability.
arXiv Detail & Related papers (2020-12-27T21:03:40Z) - Learning to Generalize for Sequential Decision Making [19.075378799280728]
We introduce a teacher-student imitation learning methodology and a means of converting a reinforcement learning model into a natural language understanding model.
We show that models can learn faster and generalize more, leveraging both the imitation learning and the reformulation.
arXiv Detail & Related papers (2020-10-05T18:00:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.