Data Augmentation for Mathematical Objects
- URL: http://arxiv.org/abs/2307.06984v1
- Date: Thu, 13 Jul 2023 16:02:45 GMT
- Title: Data Augmentation for Mathematical Objects
- Authors: Tereso del Rio and Matthew England
- Abstract summary: We consider a dataset of non-linear problems and a problem of selecting a variable ordering for cylindrical decomposition.
By swapping the variable names in already labelled problems, we generate new problem instances that do not require any further labelling.
We find this augmentation increases the accuracy of ML by 63% on average.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper discusses and evaluates ideas of data balancing and data
augmentation in the context of mathematical objects: an important topic for
both the symbolic computation and satisfiability checking communities, when
they are making use of machine learning techniques to optimise their tools. We
consider a dataset of non-linear polynomial problems and the problem of
selecting a variable ordering for cylindrical algebraic decomposition to tackle
these with. By swapping the variable names in already labelled problems, we
generate new problem instances that do not require any further labelling when
viewing the selection as a classification problem. We find this augmentation
increases the accuracy of ML models by 63% on average. We study what part of
this improvement is due to the balancing of the dataset and what is achieved
thanks to further increasing the size of the dataset, concluding that both have
a very significant effect. We finish the paper by reflecting on how this idea
could be applied in other uses of machine learning in mathematics.
Related papers
- ControlMath: Controllable Data Generation Promotes Math Generalist Models [38.0858432336873]
We propose ControlMath, an iterative method involving an equation-generator module and two LLM-based agents.
The module creates diverse equations, which the Problem-Crafter agent then transforms into math word problems.
We collect ControlMathQA, which involves 190k math word problems.
arXiv Detail & Related papers (2024-09-20T03:58:26Z) - Boarding for ISS: Imbalanced Self-Supervised: Discovery of a Scaled Autoencoder for Mixed Tabular Datasets [1.2289361708127877]
The field of imbalanced self-supervised learning has not been extensively studied.
Existing research has predominantly focused on image datasets.
We propose a novel metric to balance learning: a Multi-Supervised Balanced MSE.
arXiv Detail & Related papers (2024-03-23T10:37:22Z) - Lessons on Datasets and Paradigms in Machine Learning for Symbolic Computation: A Case Study on CAD [0.0]
This study reports lessons on the importance of analysing datasets prior to machine learning.
We present results for a particular case study, the selection of variable ordering for cylindrical algebraic decomposition.
We introduce an augmentation technique for systems that allows us to balance and further augment the dataset.
arXiv Detail & Related papers (2024-01-24T10:12:43Z) - MuggleMath: Assessing the Impact of Query and Response Augmentation on Math Reasoning [54.2093509928664]
In math reasoning with large language models, fine-tuning data augmentation by query evolution and diverse reasoning paths is empirically verified effective.
We conduct an investigation for such data augmentation in math reasoning and are intended to answer these questions.
We release our codes and augmented data in https://github.com/OFA-Sys/8k-Scel.
arXiv Detail & Related papers (2023-10-09T08:18:58Z) - Automatic Data Augmentation via Invariance-Constrained Learning [94.27081585149836]
Underlying data structures are often exploited to improve the solution of learning tasks.
Data augmentation induces these symmetries during training by applying multiple transformations to the input data.
This work tackles these issues by automatically adapting the data augmentation while solving the learning task.
arXiv Detail & Related papers (2022-09-29T18:11:01Z) - Improving Classifier Training Efficiency for Automatic Cyberbullying
Detection with Feature Density [58.64907136562178]
We study the effectiveness of Feature Density (FD) using different linguistically-backed feature preprocessing methods.
We hypothesise that estimating dataset complexity allows for the reduction of the number of required experiments.
The difference in linguistic complexity of datasets allows us to additionally discuss the efficacy of linguistically-backed word preprocessing.
arXiv Detail & Related papers (2021-11-02T15:48:28Z) - CADDA: Class-wise Automatic Differentiable Data Augmentation for EEG
Signals [92.60744099084157]
We propose differentiable data augmentation amenable to gradient-based learning.
We demonstrate the relevance of our approach on the clinically relevant sleep staging classification task.
arXiv Detail & Related papers (2021-06-25T15:28:48Z) - Measuring Mathematical Problem Solving With the MATH Dataset [55.4376028963537]
We introduce MATH, a dataset of 12,500 challenging competition mathematics problems.
Each problem has a full step-by-step solution which can be used to teach models to generate answer derivations and explanations.
We also contribute a large auxiliary pretraining dataset which helps teach models the fundamentals of mathematics.
arXiv Detail & Related papers (2021-03-05T18:59:39Z) - Data augmentation and feature selection for automatic model
recommendation in computational physics [0.0]
This article introduces two algorithms to address the lack of training data, their high dimensionality, and the non-applicability of common data augmentation techniques to physics data.
When combined with a stacking ensemble made of six multilayer perceptrons and a ridge logistic regression, they enable reaching an accuracy of 90% on our classification problem for nonlinear structural mechanics.
arXiv Detail & Related papers (2021-01-12T15:09:11Z) - Category-Learning with Context-Augmented Autoencoder [63.05016513788047]
Finding an interpretable non-redundant representation of real-world data is one of the key problems in Machine Learning.
We propose a novel method of using data augmentations when training autoencoders.
We train a Variational Autoencoder in such a way, that it makes transformation outcome predictable by auxiliary network.
arXiv Detail & Related papers (2020-10-10T14:04:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.