Related papers: Using Large Language Model to Solve and Explain Physics Word Problems Approaching Human Level

Using Large Language Model to Solve and Explain Physics Word Problems Approaching Human Level

URL: http://arxiv.org/abs/2309.08182v2
Date: Wed, 20 Sep 2023 07:08:53 GMT
Title: Using Large Language Model to Solve and Explain Physics Word Problems Approaching Human Level
Authors: Jingzhe Ding, Yan Cen, Xinyuan Wei
Abstract summary: Large language model (LLM) pre-trained on texts can not only solve pure math word problems, but also physics word problems. Our work is the first research to focus on the automatic solving, explanation, and generation of physics word problems.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Our work demonstrates that large language model (LLM) pre-trained on texts can not only solve pure math word problems, but also physics word problems, whose solution requires calculation and inference based on prior physical knowledge. We collect and annotate the first physics word problem dataset-PhysQA, which contains over 1000 junior high school physics word problems (covering Kinematics, Mass&Density, Mechanics, Heat, Electricity). Then we use OpenAI' s GPT3.5 to generate the answer of these problems and found that GPT3.5 could automatically solve 49.3% of the problems through zero-shot learning and 73.2% through few-shot learning. This result demonstrates that by using similar problems and their answers as prompt, LLM could solve elementary physics word problems approaching human level performance. In addition to solving problems, GPT3.5 can also summarize the knowledge or topics covered by the problems, provide relevant explanations, and generate new physics word problems based on the input. Our work is the first research to focus on the automatic solving, explanation, and generation of physics word problems across various types and scenarios, and we achieve an acceptable and state-of-the-art accuracy. This underscores the potential of LLMs for further applications in secondary education.

Related papers

PhysReason: A Comprehensive Benchmark towards Physics-Based Reasoning [36.193595420239845]
We present PhysReason, a 1,200-problem benchmark for evaluating large language models. Problems require an average of 8.1 solution steps, with hard requiring 15.6. Top-performing models like Deepseek-R1, Gemini-2.0-Flash-Thinking, and o3-mini-high achieve less than 60% on answer-level evaluation.
arXiv Detail & Related papers (2025-02-17T17:24:14Z)
MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations [90.07275414500154]
We observe significant performance drops on MATH-P-Hard across various models. We also raise concerns about a novel form of memorization where models blindly apply learned problem-solving skills.
arXiv Detail & Related papers (2025-02-10T13:31:46Z)
Physics Reasoner: Knowledge-Augmented Reasoning for Solving Physics Problems with Large Language Models [41.88825441287559]
Existing large language models (LLMs) frequently fail due to a lack of knowledge or incorrect knowledge application. We propose Physics Reasoner, a knowledge-augmented framework to solve physics problems with LLMs. Given a physics problem, Physics Reasoner solves it through three stages: problem analysis, formula retrieval, and guided reasoning. Empirically, Physics Reasoner mitigates the issues of insufficient knowledge and incorrect application, achieving state-of-the-art performance on SciBench with an average accuracy improvement of 5.8%.
arXiv Detail & Related papers (2024-12-18T12:33:50Z)
Fuse, Reason and Verify: Geometry Problem Solving with Parsed Clauses from Diagram [78.79651421493058]
We propose a neural-symbolic model for plane geometry problem solving (PGPS) with three key steps: modal fusion, reasoning process and knowledge verification. For reasoning, we design an explicable solution program to describe the geometric reasoning process, and employ a self-limited decoder to generate solution program autoregressively. We also construct a large-scale geometry problem dataset called PGPS9K, containing fine-grained annotations of textual clauses, solution program and involved knowledge solvers.
arXiv Detail & Related papers (2024-07-10T02:45:22Z)
Do Language Models Exhibit the Same Cognitive Biases in Problem Solving as Human Learners? [140.9751389452011]
We study the biases of large language models (LLMs) in relation to those known in children when solving arithmetic word problems. We generate a novel set of word problems for each of these tests, using a neuro-symbolic approach that enables fine-grained control over the problem features.
arXiv Detail & Related papers (2024-01-31T18:48:20Z)
G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model [124.68242155098189]
Large language models (LLMs) have shown remarkable proficiency in human-level reasoning and generation capabilities. G-LLaVA demonstrates exceptional performance in solving geometric problems, significantly outperforming GPT-4-V on the MathVista benchmark with only 7B parameters.
arXiv Detail & Related papers (2023-12-18T17:36:20Z)
Physics simulation capabilities of LLMs [0.0]
Large Language Models (LLMs) can solve some undergraduate-level to graduate-level physics textbook problems and are proficient at coding. We present an evaluation of state-of-the-art (SOTA) LLMs on PhD-level to research-level computational physics problems.
arXiv Detail & Related papers (2023-12-04T18:06:41Z)
Examining the Potential and Pitfalls of ChatGPT in Science and Engineering Problem-Solving [1.3628066756509705]
The study explores the capabilities of OpenAI's ChatGPT in solving different types of physics problems. ChatGPT could successfully solve 62.5% of the well-specified problems, but its accuracy drops to 8.3% for under-specified problems.
arXiv Detail & Related papers (2023-10-12T23:39:28Z)
Solving Math Word Problems by Combining Language Models With Symbolic Solvers [28.010617102877923]
Large language models (LLMs) can be combined with external tools to perform complex reasoning and calculation. We propose an approach that combines an LLM that can incrementally formalize word problems as a set of variables and equations with an external symbolic solver. Our approach achieves comparable accuracy to the original PAL on the GSM8K benchmark of math word problems and outperforms PAL by an absolute 20% on ALGEBRA.
arXiv Detail & Related papers (2023-04-16T04:16:06Z)
Automatic Generation of Socratic Subquestions for Teaching Math Word Problems [16.97827669744673]
We explore the ability of large language models (LMs) in generating sequential questions for guiding math word problem-solving. On both automatic and human quality evaluations, we find that LMs constrained with desirable question properties generate superior questions. Results suggest that the difficulty level of problems plays an important role in determining whether questioning improves or hinders human performance.
arXiv Detail & Related papers (2022-11-23T10:40:22Z)
Solving Quantitative Reasoning Problems with Language Models [53.53969870599973]
We introduce Minerva, a large language model pretrained on general natural language data and further trained on technical content. The model achieves state-of-the-art performance on technical benchmarks without the use of external tools. We also evaluate our model on over two hundred undergraduate-level problems in physics, biology, chemistry, economics, and other sciences.
arXiv Detail & Related papers (2022-06-29T18:54:49Z)
JiuZhang: A Chinese Pre-trained Language Model for Mathematical Problem Understanding [74.12405417718054]
This paper aims to advance the mathematical intelligence of machines by presenting the first Chinese mathematical pre-trained language model(PLM) Unlike other standard NLP tasks, mathematical texts are difficult to understand, since they involve mathematical terminology, symbols and formulas in the problem statement. We design a novel curriculum pre-training approach for improving the learning of mathematical PLMs, consisting of both basic and advanced courses.
arXiv Detail & Related papers (2022-06-13T17:03:52Z)
Why are NLP Models Fumbling at Elementary Math? A Survey of Deep Learning based Word Problem Solvers [7.299537282917047]
We critically examine the various models that have been developed for solving word problems. We take a step back and analyse why, in spite of this abundance in scholarly interest, the predominantly used experiment and dataset designs continue to be a stumbling block.
arXiv Detail & Related papers (2022-05-31T10:51:25Z)
SMART: A Situation Model for Algebra Story Problems via Attributed Grammar [74.1315776256292]
We introduce the concept of a emphsituation model, which originates from psychology studies to represent the mental states of humans in problem-solving. We show that the proposed model outperforms all previous neural solvers by a large margin while preserving much better interpretability.
arXiv Detail & Related papers (2020-12-27T21:03:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.