Breaking the Data Barrier in Learning Symbolic Computation: A Case Study on Variable Ordering Suggestion for Cylindrical Algebraic Decomposition
- URL: http://arxiv.org/abs/2601.13731v1
- Date: Tue, 20 Jan 2026 08:40:35 GMT
- Title: Breaking the Data Barrier in Learning Symbolic Computation: A Case Study on Variable Ordering Suggestion for Cylindrical Algebraic Decomposition
- Authors: Rui-Juan Jing, Yuegang Zhao, Changbo Chen,
- Abstract summary: Symbolic computation has important applications in mathematical reasoning through exact deep computations.<n>Existing learning-based approaches are only competitive with the best expert-based methods.<n>We design a series of intimately connected tasks for which a large amount of annotated data can be easily obtained.<n>Experiments on publicly available CAD ordering datasets show that on average the orderings predicted by the new model are significantly better than those suggested.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Symbolic computation, powered by modern computer algebra systems, has important applications in mathematical reasoning through exact deep computations. The efficiency of symbolic computation is largely constrained by such deep computations in high dimension. This creates a fundamental barrier on labelled data acquisition if leveraging supervised deep learning to accelerate symbolic computation. Cylindrical algebraic decomposition (CAD) is a pillar symbolic computation method for reasoning with first-order logic formulas over reals with many applications in formal verification and automatic theorem proving. Variable orderings have a huge impact on its efficiency. Impeded by the difficulty to acquire abundant labelled data, existing learning-based approaches are only competitive with the best expert-based heuristics. In this work, we address this problem by designing a series of intimately connected tasks for which a large amount of annotated data can be easily obtained. We pre-train a Transformer model with these data and then fine-tune it on the datasets for CAD ordering. Experiments on publicly available CAD ordering datasets show that on average the orderings predicted by the new model are significantly better than those suggested by the best heuristic methods.
Related papers
- The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner [71.41162392872393]
This paper proposes Turing MAchine Imitation Learning (TAIL) to improve the length generalization ability of large language models.<n>TAIL synthesizes chain-of-thoughts (CoT) data that imitates the execution process of a Turing Machine by computer programs.<n>Without bells and whistles, TAIL significantly improves the length generalization ability as well as the performance of Qwen2.5-7B on various tasks.
arXiv Detail & Related papers (2025-07-17T17:50:07Z) - RV-Syn: Rational and Verifiable Mathematical Reasoning Data Synthesis based on Structured Function Library [58.404895570822184]
RV-Syn is a novel mathematical Synthesis approach.<n>It generates graphs as solutions by combining Python-formatted functions from this library.<n>Based on the constructed graph, we achieve solution-guided logic-aware problem generation.
arXiv Detail & Related papers (2025-04-29T04:42:02Z) - LOCAL: Learning with Orientation Matrix to Infer Causal Structure from Time Series Data [51.47827479376251]
LOCAL is a highly efficient, easy-to-implement, and constraint-free method for recovering dynamic causal structures.<n>Asymptotic Causal Learning Mask (ACML) and Dynamic Graph Learning (DGPL)<n>Experiments on synthetic and real-world datasets demonstrate that LOCAL significantly outperforms existing methods.
arXiv Detail & Related papers (2024-10-25T10:48:41Z) - Discovering physical laws with parallel symbolic enumeration [67.36739393470869]
We introduce parallel symbolic enumeration (PSE) to efficiently distill generic mathematical expressions from limited data.<n>Experiments show that PSE achieves higher accuracy and faster computation compared to the state-of-the-art baseline algorithms.<n> PSE represents an advance in accurate and efficient data-driven discovery of symbolic, interpretable models.
arXiv Detail & Related papers (2024-07-05T10:41:15Z) - In-Database Data Imputation [0.6157028677798809]
Missing data is a widespread problem in many domains, creating challenges in data analysis and decision making.
Traditional techniques for dealing with missing data, such as excluding incomplete records or imputing simple estimates, are computationally efficient but may introduce bias and disrupt variable relationships.
Model-based imputation techniques offer a more robust solution that preserves the variability and relationships in the data, but they demand significantly more computation time.
This work enables efficient, high-quality, and scalable data imputation within a database system using the widely used MICE method.
arXiv Detail & Related papers (2024-01-07T01:57:41Z) - Deep Generative Symbolic Regression [83.04219479605801]
Symbolic regression aims to discover concise closed-form mathematical equations from data.
Existing methods, ranging from search to reinforcement learning, fail to scale with the number of input variables.
We propose an instantiation of our framework, Deep Generative Symbolic Regression.
arXiv Detail & Related papers (2023-12-30T17:05:31Z) - Revisiting Variable Ordering for Real Quantifier Elimination using
Machine Learning [0.7388859384645262]
We apply symmetries to create a new dataset containing more than 41K MetiTarski challenges designed to remove bias.
We evaluate issues of information leakage, and test the generalizability of our models on the new dataset.
arXiv Detail & Related papers (2023-02-27T18:48:33Z) - Automatic Data Augmentation via Invariance-Constrained Learning [94.27081585149836]
Underlying data structures are often exploited to improve the solution of learning tasks.
Data augmentation induces these symmetries during training by applying multiple transformations to the input data.
This work tackles these issues by automatically adapting the data augmentation while solving the learning task.
arXiv Detail & Related papers (2022-09-29T18:11:01Z) - Berrut Approximated Coded Computing: Straggler Resistance Beyond
Polynomial Computing [34.69732430310801]
We propose Berrut Approximated Coded Computing (BACC) as an alternative approach to deal with stragglers effect.
BACC is proven to be numerically stable with low computational complexity.
In particular, BACC is used to train a deep neural network on a cluster of servers.
arXiv Detail & Related papers (2020-09-17T14:23:38Z) - Learning Generalized Relational Heuristic Networks for Model-Agnostic
Planning [29.714818991696088]
This paper develops a new approach for learning generalizeds in the absence of symbolic action models.
It uses an abstract state representation to facilitate data efficient, generalizable learning.
arXiv Detail & Related papers (2020-07-10T06:08:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.