Related papers: Lessons on Datasets and Paradigms in Machine Learning for Symbolic Computation: A Case Study on CAD

Lessons on Datasets and Paradigms in Machine Learning for Symbolic Computation: A Case Study on CAD

URL: http://arxiv.org/abs/2401.13343v2
Date: Thu, 20 Jun 2024 10:32:20 GMT
Title: Lessons on Datasets and Paradigms in Machine Learning for Symbolic Computation: A Case Study on CAD
Authors: Tereso del Río, Matthew England,
Abstract summary: This study reports lessons on the importance of analysing datasets prior to machine learning. We present results for a particular case study, the selection of variable ordering for cylindrical algebraic decomposition. We introduce an augmentation technique for systems that allows us to balance and further augment the dataset.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Symbolic Computation algorithms and their implementation in computer algebra systems often contain choices which do not affect the correctness of the output but can significantly impact the resources required: such choices can benefit from having them made separately for each problem via a machine learning model. This study reports lessons on such use of machine learning in symbolic computation, in particular on the importance of analysing datasets prior to machine learning and on the different machine learning paradigms that may be utilised. We present results for a particular case study, the selection of variable ordering for cylindrical algebraic decomposition, but expect that the lessons learned are applicable to other decisions in symbolic computation. We utilise an existing dataset of examples derived from applications which was found to be imbalanced with respect to the variable ordering decision. We introduce an augmentation technique for polynomial systems problems that allows us to balance and further augment the dataset, improving the machine learning results by 28\% and 38\% on average, respectively. We then demonstrate how the existing machine learning methodology used for the problem $-$ classification $-$ might be recast into the regression paradigm. While this does not have a radical change on the performance, it does widen the scope in which the methodology can be applied to make choices.

Related papers

Learning Linear Attention in Polynomial Time [115.68795790532289]
We provide the first results on learnability of single-layer Transformers with linear attention. We show that linear attention may be viewed as a linear predictor in a suitably defined RKHS. We show how to efficiently identify training datasets for which every empirical riskr is equivalent to the linear Transformer.
arXiv Detail & Related papers (2024-10-14T02:41:01Z)
Controlling dynamical systems to complex target states using machine learning: next-generation vs. classical reservoir computing [68.8204255655161]
Controlling nonlinear dynamical systems using machine learning allows to drive systems into simple behavior like periodicity but also to more complex arbitrary dynamics. We show first that classical reservoir computing excels at this task. In a next step, we compare those results based on different amounts of training data to an alternative setup, where next-generation reservoir computing is used instead. It turns out that while delivering comparable performance for usual amounts of training data, next-generation RC significantly outperforms in situations where only very limited data is available.
arXiv Detail & Related papers (2023-07-14T07:05:17Z)
Data Augmentation for Mathematical Objects [0.0]
We consider a dataset of non-linear problems and a problem of selecting a variable ordering for cylindrical decomposition. By swapping the variable names in already labelled problems, we generate new problem instances that do not require any further labelling. We find this augmentation increases the accuracy of ML by 63% on average.
arXiv Detail & Related papers (2023-07-13T16:02:45Z)
Explainable AI Insights for Symbolic Computation: A case study on selecting the variable ordering for cylindrical algebraic decomposition [0.0]
This paper explores whether using explainable AI (XAI) techniques on such machine learning models can offer new insight for symbolic computation. We present a case study on the use of ML to select the variable ordering for cylindrical algebraic decomposition.
arXiv Detail & Related papers (2023-04-24T15:05:04Z)
Revisiting Variable Ordering for Real Quantifier Elimination using Machine Learning [0.7388859384645262]
We apply symmetries to create a new dataset containing more than 41K MetiTarski challenges designed to remove bias. We evaluate issues of information leakage, and test the generalizability of our models on the new dataset.
arXiv Detail & Related papers (2023-02-27T18:48:33Z)
Automatic Data Augmentation via Invariance-Constrained Learning [94.27081585149836]
Underlying data structures are often exploited to improve the solution of learning tasks. Data augmentation induces these symmetries during training by applying multiple transformations to the input data. This work tackles these issues by automatically adapting the data augmentation while solving the learning task.
arXiv Detail & Related papers (2022-09-29T18:11:01Z)
Advancing Reacting Flow Simulations with Data-Driven Models [50.9598607067535]
Key to effective use of machine learning tools in multi-physics problems is to couple them to physical and computer models. The present chapter reviews some of the open opportunities for the application of data-driven reduced-order modeling of combustion systems.
arXiv Detail & Related papers (2022-09-05T16:48:34Z)
GENEOnet: A new machine learning paradigm based on Group Equivariant Non-Expansive Operators. An application to protein pocket detection [97.5153823429076]
We introduce a new computational paradigm based on Group Equivariant Non-Expansive Operators. We test our method, called GENEOnet, on a key problem in drug design: detecting pockets on the surface of proteins that can host.
arXiv Detail & Related papers (2022-01-31T11:14:51Z)
How to effectively use machine learning models to predict the solutions for optimization problems: lessons from loss function [0.0]
This paper aims to predict a good solution for constraint optimization problems using advanced machine learning techniques. It extends the work of citeabbasi 2020predicting to use machine learning models for predicting the solution of large-scaled optimization models.
arXiv Detail & Related papers (2021-05-14T02:14:00Z)
A Survey on Large-scale Machine Learning [67.6997613600942]
Machine learning can provide deep insights into data, allowing machines to make high-quality predictions. Most sophisticated machine learning approaches suffer from huge time costs when operating on large-scale data. Large-scale Machine Learning aims to learn patterns from big data with comparable performance efficiently.
arXiv Detail & Related papers (2020-08-10T06:07:52Z)
An analysis on the use of autoencoders for representation learning: fundamentals, learning task case studies, explainability and challenges [11.329636084818778]
In many machine learning tasks, learning a good representation of the data can be the key to building a well-performant solution. We present a series of learning tasks: data embedding for visualization, image denoising, semantic hashing, detection of abnormal behaviors and instance generation. A solution is proposed for each task employing autoencoders as the only learning method.
arXiv Detail & Related papers (2020-05-21T08:41:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.