Using Large Language Model to Solve and Explain Physics Word Problems
Approaching Human Level
- URL: http://arxiv.org/abs/2309.08182v2
- Date: Wed, 20 Sep 2023 07:08:53 GMT
- Title: Using Large Language Model to Solve and Explain Physics Word Problems
Approaching Human Level
- Authors: Jingzhe Ding, Yan Cen, Xinyuan Wei
- Abstract summary: Large language model (LLM) pre-trained on texts can not only solve pure math word problems, but also physics word problems.
Our work is the first research to focus on the automatic solving, explanation, and generation of physics word problems.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Our work demonstrates that large language model (LLM) pre-trained on texts
can not only solve pure math word problems, but also physics word problems,
whose solution requires calculation and inference based on prior physical
knowledge. We collect and annotate the first physics word problem
dataset-PhysQA, which contains over 1000 junior high school physics word
problems (covering Kinematics, Mass&Density, Mechanics, Heat, Electricity).
Then we use OpenAI' s GPT3.5 to generate the answer of these problems and found
that GPT3.5 could automatically solve 49.3% of the problems through zero-shot
learning and 73.2% through few-shot learning. This result demonstrates that by
using similar problems and their answers as prompt, LLM could solve elementary
physics word problems approaching human level performance. In addition to
solving problems, GPT3.5 can also summarize the knowledge or topics covered by
the problems, provide relevant explanations, and generate new physics word
problems based on the input. Our work is the first research to focus on the
automatic solving, explanation, and generation of physics word problems across
various types and scenarios, and we achieve an acceptable and state-of-the-art
accuracy. This underscores the potential of LLMs for further applications in
secondary education.
Related papers
- PhysReason: A Comprehensive Benchmark towards Physics-Based Reasoning [36.193595420239845]
We present PhysReason, a 1,200-problem benchmark for evaluating large language models.
Problems require an average of 8.1 solution steps, with hard requiring 15.6.
Top-performing models like Deepseek-R1, Gemini-2.0-Flash-Thinking, and o3-mini-high achieve less than 60% on answer-level evaluation.
arXiv Detail & Related papers (2025-02-17T17:24:14Z) - MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations [90.07275414500154]
We observe significant performance drops on MATH-P-Hard across various models.
We also raise concerns about a novel form of memorization where models blindly apply learned problem-solving skills.
arXiv Detail & Related papers (2025-02-10T13:31:46Z) - Physics Reasoner: Knowledge-Augmented Reasoning for Solving Physics Problems with Large Language Models [41.88825441287559]
Existing large language models (LLMs) frequently fail due to a lack of knowledge or incorrect knowledge application.
We propose Physics Reasoner, a knowledge-augmented framework to solve physics problems with LLMs.
Given a physics problem, Physics Reasoner solves it through three stages: problem analysis, formula retrieval, and guided reasoning.
Empirically, Physics Reasoner mitigates the issues of insufficient knowledge and incorrect application, achieving state-of-the-art performance on SciBench with an average accuracy improvement of 5.8%.
arXiv Detail & Related papers (2024-12-18T12:33:50Z) - Fuse, Reason and Verify: Geometry Problem Solving with Parsed Clauses from Diagram [78.79651421493058]
We propose a neural-symbolic model for plane geometry problem solving (PGPS) with three key steps: modal fusion, reasoning process and knowledge verification.
For reasoning, we design an explicable solution program to describe the geometric reasoning process, and employ a self-limited decoder to generate solution program autoregressively.
We also construct a large-scale geometry problem dataset called PGPS9K, containing fine-grained annotations of textual clauses, solution program and involved knowledge solvers.
arXiv Detail & Related papers (2024-07-10T02:45:22Z) - Physics simulation capabilities of LLMs [0.0]
Large Language Models (LLMs) can solve some undergraduate-level to graduate-level physics textbook problems and are proficient at coding.
We present an evaluation of state-of-the-art (SOTA) LLMs on PhD-level to research-level computational physics problems.
arXiv Detail & Related papers (2023-12-04T18:06:41Z) - Examining the Potential and Pitfalls of ChatGPT in Science and
Engineering Problem-Solving [1.3628066756509705]
The study explores the capabilities of OpenAI's ChatGPT in solving different types of physics problems.
ChatGPT could successfully solve 62.5% of the well-specified problems, but its accuracy drops to 8.3% for under-specified problems.
arXiv Detail & Related papers (2023-10-12T23:39:28Z) - Automatic Generation of Socratic Subquestions for Teaching Math Word
Problems [16.97827669744673]
We explore the ability of large language models (LMs) in generating sequential questions for guiding math word problem-solving.
On both automatic and human quality evaluations, we find that LMs constrained with desirable question properties generate superior questions.
Results suggest that the difficulty level of problems plays an important role in determining whether questioning improves or hinders human performance.
arXiv Detail & Related papers (2022-11-23T10:40:22Z) - Solving Quantitative Reasoning Problems with Language Models [53.53969870599973]
We introduce Minerva, a large language model pretrained on general natural language data and further trained on technical content.
The model achieves state-of-the-art performance on technical benchmarks without the use of external tools.
We also evaluate our model on over two hundred undergraduate-level problems in physics, biology, chemistry, economics, and other sciences.
arXiv Detail & Related papers (2022-06-29T18:54:49Z) - JiuZhang: A Chinese Pre-trained Language Model for Mathematical Problem
Understanding [74.12405417718054]
This paper aims to advance the mathematical intelligence of machines by presenting the first Chinese mathematical pre-trained language model(PLM)
Unlike other standard NLP tasks, mathematical texts are difficult to understand, since they involve mathematical terminology, symbols and formulas in the problem statement.
We design a novel curriculum pre-training approach for improving the learning of mathematical PLMs, consisting of both basic and advanced courses.
arXiv Detail & Related papers (2022-06-13T17:03:52Z) - SMART: A Situation Model for Algebra Story Problems via Attributed
Grammar [74.1315776256292]
We introduce the concept of a emphsituation model, which originates from psychology studies to represent the mental states of humans in problem-solving.
We show that the proposed model outperforms all previous neural solvers by a large margin while preserving much better interpretability.
arXiv Detail & Related papers (2020-12-27T21:03:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.