Related papers: A Symbolic Framework for Evaluating Mathematical Reasoning and Generalisation with Transformers

A Symbolic Framework for Evaluating Mathematical Reasoning and Generalisation with Transformers

URL: http://arxiv.org/abs/2305.12563v2
Date: Mon, 8 Apr 2024 14:29:06 GMT
Title: A Symbolic Framework for Evaluating Mathematical Reasoning and Generalisation with Transformers
Authors: Jordan Meadows, Marco Valentino, Damien Teney, Andre Freitas,
Abstract summary: We evaluate the generalisability of Transformers to out-of-distribution mathematical reasoning problems. We compare the capabilities of GPT-4, GPT-3.5, and a canon of fine-tuned BERT models. Surprisingly, our evaluation reveals that the average in-distribution performance of fine-tuned models surpasses GPT-3.5, and rivals GPT-4.
Score: 17.075558137261986
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper proposes a methodology for generating and perturbing detailed derivations of equations at scale, aided by a symbolic engine, to evaluate the generalisability of Transformers to out-of-distribution mathematical reasoning problems. Instantiating the framework in the context of sequence classification tasks, we compare the capabilities of GPT-4, GPT-3.5, and a canon of fine-tuned BERT models, exploring the relationship between specific operators and generalisation failure via the perturbation of reasoning aspects such as symmetry and variable surface forms. Surprisingly, our empirical evaluation reveals that the average in-distribution performance of fine-tuned models surpasses GPT-3.5, and rivals GPT-4. However, perturbations to input reasoning can reduce their performance by up to 80 F1 points. Overall, the results suggest that the in-distribution performance of smaller open-source models may potentially rival GPT by incorporating appropriately structured derivation dependencies during training, and highlight a shared weakness between BERT and GPT involving a relative inability to decode indirect references to mathematical entities. We release the full codebase, constructed datasets, and fine-tuned models to encourage future progress in the field.

Related papers

Partial Transportability for Domain Generalization [56.37032680901525]
Building on the theory of partial identification and transportability, this paper introduces new results for bounding the value of a functional of the target distribution. Our contribution is to provide the first general estimation technique for transportability problems. We propose a gradient-based optimization scheme for making scalable inferences in practice.
arXiv Detail & Related papers (2025-03-30T22:06:37Z)
Harmonic Loss Trains Interpretable AI Models [13.745919535064429]
We introduce harmonic loss as an alternative to the standard cross-entropy loss for training neural networks and large language models. We first validate the performance of harmonic models across algorithmic, vision, and language datasets. We demonstrate that models trained with harmonic loss outperform standard models by: (a) enhancing interpretability, (b) requiring less data for generalization, and (c) reducing grokking.
arXiv Detail & Related papers (2025-02-03T18:57:17Z)
Rethinking Relation Extraction: Beyond Shortcuts to Generalization with a Debiased Benchmark [53.876493664396506]
Benchmarks are crucial for evaluating machine learning algorithm performance, facilitating comparison and identifying superior solutions. This paper addresses the issue of entity bias in relation extraction tasks, where models tend to rely on entity mentions rather than context. We propose a debiased relation extraction benchmark DREB that breaks the pseudo-correlation between entity mentions and relation types through entity replacement. To establish a new baseline on DREB, we introduce MixDebias, a debiasing method combining data-level and model training-level techniques.
arXiv Detail & Related papers (2025-01-02T17:01:06Z)
HG-Adapter: Improving Pre-Trained Heterogeneous Graph Neural Networks with Dual Adapters [53.97380482341493]
"pre-train, prompt-tuning" has demonstrated impressive performance for tuning pre-trained heterogeneous graph neural networks (HGNNs) We propose a unified framework that combines two new adapters with potential labeled data extension to improve the generalization of pre-trained HGNN models.
arXiv Detail & Related papers (2024-11-02T06:43:54Z)
Fairness-Aware Estimation of Graphical Models [13.39268712338485]
This paper examines the issue of fairness in the estimation of graphical models (GMs) Standard GMs can result in biased outcomes, especially when the underlying data involves sensitive characteristics or protected groups. We introduce a comprehensive framework designed to reduce bias in the estimation of GMs related to protected attributes.
arXiv Detail & Related papers (2024-08-30T16:30:00Z)
Learn to Preserve and Diversify: Parameter-Efficient Group with Orthogonal Regularization for Domain Generalization [28.977757627384165]
Domain Domain (DG) aims to avoid the performance degradation of the model when the distribution shift between the limited training data and unseen test data occurs. Recently, foundation models with enormous parameters have been pre-trained with huge datasets, demonstrating strong generalization ability. Our framework achieves SOTA performance on five DG benchmarks, while only requiring training a small number of parameters without adding additional testing cost.
arXiv Detail & Related papers (2024-07-21T07:50:49Z)
GenBench: A Benchmarking Suite for Systematic Evaluation of Genomic Foundation Models [56.63218531256961]
We introduce GenBench, a benchmarking suite specifically tailored for evaluating the efficacy of Genomic Foundation Models. GenBench offers a modular and expandable framework that encapsulates a variety of state-of-the-art methodologies. We provide a nuanced analysis of the interplay between model architecture and dataset characteristics on task-specific performance.
arXiv Detail & Related papers (2024-06-01T08:01:05Z)
A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime. We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z)
Aggregation Weighting of Federated Learning via Generalization Bound Estimation [65.8630966842025]
Federated Learning (FL) typically aggregates client model parameters using a weighting approach determined by sample proportions. We replace the aforementioned weighting method with a new strategy that considers the generalization bounds of each local model.
arXiv Detail & Related papers (2023-11-10T08:50:28Z)
Boosted Control Functions: Distribution generalization and invariance in confounded models [10.503777692702952]
We introduce a strong notion of invariance that allows for distribution generalization even in the presence of nonlinear, non-identifiable structural functions. We propose the ControlTwicing algorithm to estimate the Boosted Control Function (BCF) using flexible machine-learning techniques.
arXiv Detail & Related papers (2023-10-09T15:43:46Z)
Generating Mathematical Derivations with Large Language Models [2.363388546004777]
We leverage a symbolic engine to generate derivations of equations at scale. We investigate the capabilities of Large Language Models when deriving goal equations from premises.
arXiv Detail & Related papers (2023-07-19T14:13:02Z)
Towards Principled Disentanglement for Domain Generalization [90.9891372499545]
A fundamental challenge for machine learning models is generalizing to out-of-distribution (OOD) data. We first formalize the OOD generalization problem as constrained optimization, called Disentanglement-constrained Domain Generalization (DDG) Based on the transformation, we propose a primal-dual algorithm for joint representation disentanglement and domain generalization.
arXiv Detail & Related papers (2021-11-27T07:36:32Z)
Theory-guided Auto-Encoder for Surrogate Construction and Inverse Modeling [0.0]
The framework is built based on the Auto-Encoder architecture of convolutional neural network (CNN) The governing equations of the studied problems can be discretized and the finite difference scheme of the equations can be embedded into the training of CNN. The trained TgAE can be used to construct a surrogate that approximates the relationship between the model parameters and responses with limited labeled data.
arXiv Detail & Related papers (2020-11-17T13:23:03Z)
Generalization Properties of Optimal Transport GANs with Latent Distribution Learning [52.25145141639159]
We study how the interplay between the latent distribution and the complexity of the pushforward map affects performance. Motivated by our analysis, we advocate learning the latent distribution as well as the pushforward map within the GAN paradigm.
arXiv Detail & Related papers (2020-07-29T07:31:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.