Related papers: Learning Mathematical Rules with Large Language Models

Learning Mathematical Rules with Large Language Models

URL: http://arxiv.org/abs/2410.16973v3
Date: Fri, 25 Oct 2024 13:28:52 GMT
Title: Learning Mathematical Rules with Large Language Models
Authors: Antoine Gorceix, Bastien Le Chenadec, Ahmad Rammal, Nelson Vadori, Manuela Veloso,
Abstract summary: We study the ability of large language models to learn specific mathematical rules such as distributivity or simplifying equations. We present an empirical analysis of their ability to generalize these rules, as well as to reuse them in the context of word problems.
Score: 10.285317818397298
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we study the ability of large language models to learn specific mathematical rules such as distributivity or simplifying equations. We present an empirical analysis of their ability to generalize these rules, as well as to reuse them in the context of word problems. For this purpose, we provide a rigorous methodology to build synthetic data incorporating such rules, and perform fine-tuning of large language models on such data. Our experiments show that our model can learn and generalize these rules to some extent, as well as suitably reuse them in the context of word problems.

Related papers

LemmaHead: RAG Assisted Proof Generation Using Large Language Models [0.0]
We develop LemmaHead, a knowledge base that supplements queries to the model with relevant mathematical context. To measure our model's performance in mathematical reasoning, our testing paradigm focuses on the task of automated theorem proving.
arXiv Detail & Related papers (2025-01-27T05:46:06Z)
MetaRuleGPT: Recursive Numerical Reasoning of Language Models Trained with Simple Rules [6.988553014376883]
We introduce MetaRuleGPT, a novel Transformer-based architecture that performs precise numerical calculations. In contrast with traditional training sets, which are heavily composed of massive raw instance data, MetaRuleGPT is pre-trained on much less abstract datasets.
arXiv Detail & Related papers (2024-12-18T06:27:10Z)
Compositional Generalization with Grounded Language Models [9.96679221246835]
Grounded language models use external sources of information, such as knowledge graphs, to meet some of the general challenges associated with pre-training. We develop a procedure for generating natural language questions paired with knowledge graphs that targets different aspects of compositionality.
arXiv Detail & Related papers (2024-06-07T14:56:51Z)
Discovering Interpretable Physical Models using Symbolic Regression and Discrete Exterior Calculus [55.2480439325792]
We propose a framework that combines Symbolic Regression (SR) and Discrete Exterior Calculus (DEC) for the automated discovery of physical models. DEC provides building blocks for the discrete analogue of field theories, which are beyond the state-of-the-art applications of SR to physical problems. We prove the effectiveness of our methodology by re-discovering three models of Continuum Physics from synthetic experimental data.
arXiv Detail & Related papers (2023-10-10T13:23:05Z)
Large Language Models as Analogical Reasoners [155.9617224350088]
Chain-of-thought (CoT) prompting for language models demonstrates impressive performance across reasoning tasks. We introduce a new prompting approach, analogical prompting, designed to automatically guide the reasoning process of large language models.
arXiv Detail & Related papers (2023-10-03T00:57:26Z)
RAVEN: In-Context Learning with Retrieval-Augmented Encoder-Decoder Language Models [57.12888828853409]
RAVEN is a model that combines retrieval-augmented masked language modeling and prefix language modeling. Fusion-in-Context Learning enables the model to leverage more in-context examples without requiring additional training. Our work underscores the potential of retrieval-augmented encoder-decoder language models for in-context learning.
arXiv Detail & Related papers (2023-08-15T17:59:18Z)
Learning Symbolic Rules over Abstract Meaning Representations for Textual Reinforcement Learning [63.148199057487226]
We propose a modular, NEuroSymbolic Textual Agent (NESTA) that combines a generic semantic generalization with a rule induction system to learn interpretable rules as policies. Our experiments show that the proposed NESTA method outperforms deep reinforcement learning-based techniques by achieving better to unseen test games and learning from fewer training interactions.
arXiv Detail & Related papers (2023-07-05T23:21:05Z)
Physics of Language Models: Part 1, Learning Hierarchical Language Structures [51.68385617116854]
Transformer-based language models are effective but complex, and understanding their inner workings is a significant challenge. We introduce a family of synthetic CFGs that produce hierarchical rules, capable of generating lengthy sentences. We demonstrate that generative models like GPT can accurately learn this CFG language and generate sentences based on it.
arXiv Detail & Related papers (2023-05-23T04:28:16Z)
Sample-efficient Linguistic Generalizations through Program Synthesis: Experiments with Phonology Problems [12.661592819420727]
We develop a synthesis model to learn phonology rules as programs in a domain-specific language. We test the ability of our models to generalize from few training examples using our new dataset of problems from the Linguistics Olympiad.
arXiv Detail & Related papers (2021-06-11T18:36:07Z)
Explainable Matrix -- Visualization for Global and Local Interpretability of Random Forest Classification Ensembles [78.6363825307044]
We propose Explainable Matrix (ExMatrix), a novel visualization method for Random Forest (RF) interpretability. It employs a simple yet powerful matrix-like visual metaphor, where rows are rules, columns are features, and cells are rules predicates. ExMatrix applicability is confirmed via different examples, showing how it can be used in practice to promote RF models interpretability.
arXiv Detail & Related papers (2020-05-08T21:03:48Z)
An Analysis on the Learning Rules of the Skip-Gram Model [4.211128681972148]
We derive the learning rules for the skip-gram model and establish their close relationship to competitive learning. We provide the global optimal solution constraints for the skip-gram model and validate them by experimental results.
arXiv Detail & Related papers (2020-03-18T22:17:48Z)
Learning Compositional Rules via Neural Program Synthesis [67.62112086708859]
We present a neuro-symbolic model which learns entire rule systems from a small set of examples. Instead of directly predicting outputs from inputs, we train our model to induce the explicit system of rules governing a set of previously seen examples.
arXiv Detail & Related papers (2020-03-12T01:06:48Z)
Keeping it simple: Implementation and performance of the proto-principle of adaptation and learning in the language sciences [0.9845144212844665]
We present the Widrow-Hoff rule and its applications to language data. After contextualizing the rule historically and placing it in the chain of neurally inspired artificial learning models, we explain its rationale and implementational considerations.
arXiv Detail & Related papers (2020-03-08T17:07:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.