Learning Tree Pattern Transformations
- URL: http://arxiv.org/abs/2410.07708v1
- Date: Thu, 10 Oct 2024 08:20:57 GMT
- Title: Learning Tree Pattern Transformations
- Authors: Daniel Neider, Leif Sabellek, Johannes Schmidt, Fabian Vehlken, Thomas Zeume,
- Abstract summary: Explaining why and how a tree $t$ structurally differs from another tree $t*$ is a question that is encountered throughout computer science.
In this article, we explore how to learn explanations for structural differences between pairs of trees from sample data.
- Score: 5.767156832161818
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Explaining why and how a tree $t$ structurally differs from another tree $t^*$ is a question that is encountered throughout computer science, including in understanding tree-structured data such as XML or JSON data. In this article, we explore how to learn explanations for structural differences between pairs of trees from sample data: suppose we are given a set $\{(t_1, t_1^*),\dots, (t_n, t_n^*)\}$ of pairs of labelled, ordered trees; is there a small set of rules that explains the structural differences between all pairs $(t_i, t_i^*)$? This raises two research questions: (i) what is a good notion of "rule" in this context?; and (ii) how can sets of rules explaining a data set be learnt algorithmically? We explore these questions from the perspective of database theory by (1) introducing a pattern-based specification language for tree transformations; (2) exploring the computational complexity of variants of the above algorithmic problem, e.g. showing NP-hardness for very restricted variants; and (3) discussing how to solve the problem for data from CS education research using SAT solvers.
Related papers
- Optimal Decision Tree Pruning Revisited: Algorithms and Complexity [22.57063332430197]
We focus on fundamental pruning operations of subtree replacement and raising.
While optimal pruning can be performed in time for subtree replacement, the problem is NP-complete for subtree raising.
For example, while subtree raising is hard for small domain size, it can be solved in $D2d cdot |I|O(1)$ time, where $|I|$ is the input size.
arXiv Detail & Related papers (2025-03-05T15:02:46Z) - Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration Pitfalls [83.89771461061903]
Recent advancements in tree search algorithms guided by verifiers have significantly enhanced the reasoning capabilities of large language models (LLMs)
Recent advancements in tree search algorithms guided by verifiers have significantly enhanced the reasoning capabilities of large language models (LLMs)
We identify two key challenges contributing to this inefficiency: $textitover-exploration$ due to redundant states with semantically equivalent content, and $textitunder-exploration$ caused by high variance in verifier scoring.
We propose FETCH, a flexible, plug-and-play system compatible with various tree search algorithms.
arXiv Detail & Related papers (2025-02-16T16:12:01Z) - Probabilistic Tree-of-thought Reasoning for Answering
Knowledge-intensive Complex Questions [93.40614719648386]
Large language models (LLMs) are capable of answering knowledge-intensive complex questions with chain-of-thought (CoT) reasoning.
Recent works turn to retrieving external knowledge to augment CoT reasoning.
We propose a novel approach: Probabilistic Tree-of-thought Reasoning (ProbTree)
arXiv Detail & Related papers (2023-11-23T12:52:37Z) - On Computing Optimal Tree Ensembles [7.424944196676223]
Random forests and, more generally, (decisionnobreakdash-)tree ensembles are widely used methods for classification and regression.
Recent algorithmic advances allow to compute decision trees that are optimal for various measures such as their size or depth.
We provide two novel algorithms and corresponding lower bounds.
arXiv Detail & Related papers (2023-06-07T13:30:43Z) - How to enumerate trees from a context-free grammar [0.0]
I present a simple algorithm for enumerating the trees generated by a Context Free Grammar (CFG)
I also show how this algorithm may be generalized to more general forms of derivation, including analogs of Lempel-Ziv coding on trees.
arXiv Detail & Related papers (2023-04-30T16:40:54Z) - Learning-Augmented B-Trees [11.542679443281411]
We study learning-augmented binary search trees (BSTs) and B-Trees via Treaps with composite priorities.
The result is a simple search tree where the depth of each item is determined by its predicted weight $w_x$.
arXiv Detail & Related papers (2022-11-16T22:50:40Z) - Structure-Unified M-Tree Coding Solver for MathWord Problem [57.825176412485504]
In previous work, models designed by taking into account the properties of the binary tree structure of mathematical expressions at the output side have achieved better performance.
In this paper, we propose the Structure-Unified M-Tree Coding Coding (S-UMCr), which applies a tree with any M branches (M-tree) to unify the output structures.
Experimental results on the widely used MAWPS and Math23K datasets have demonstrated that SUMC-r not only outperforms several state-of-the-art models but also performs much better under low-resource conditions.
arXiv Detail & Related papers (2022-10-22T12:20:36Z) - Active-LATHE: An Active Learning Algorithm for Boosting the Error
Exponent for Learning Homogeneous Ising Trees [75.93186954061943]
We design and analyze an algorithm that boosts the error exponent by at least 40% when $rho$ is at least $0.8$.
Our analysis hinges on judiciously exploiting the minute but detectable statistical variation of the samples to allocate more data to parts of the graph.
arXiv Detail & Related papers (2021-10-27T10:45:21Z) - On Learning and Testing Decision Tree [0.22843885788439797]
We give a new algorithm that achieves $phi(s,1/epsilon)=2O(log3s)/epsilon3)$.
We also study the testability of depth-$d$ decision tree and give a it distribution free tester.
arXiv Detail & Related papers (2021-08-10T11:04:06Z) - On the Parameterized Complexity of Polytree Learning [4.060731229044571]
A fundamental task in data science is to learn a Bayesian network from observed data.
We show that textscPolytree Learning can be solved in $3n cdot |I|mathcalO(1)$ time.
arXiv Detail & Related papers (2021-05-20T11:29:12Z) - Visualizing hierarchies in scRNA-seq data using a density tree-biased
autoencoder [50.591267188664666]
We propose an approach for identifying a meaningful tree structure from high-dimensional scRNA-seq data.
We then introduce DTAE, a tree-biased autoencoder that emphasizes the tree structure of the data in low dimensional space.
arXiv Detail & Related papers (2021-02-11T08:48:48Z) - Decision trees as partitioning machines to characterize their
generalization properties [2.370481325034443]
We revisit binary decision trees on real-valued features from the perspective of partitions of the data.
We show that the VC dimension of a binary tree structure with $N$ internal nodes is of order $N log(Nell)$.
We elaborate a pruning algorithm based on these results that performs better than the CART algorithm on a number of datasets.
arXiv Detail & Related papers (2020-10-14T19:25:58Z) - Recursive Top-Down Production for Sentence Generation with Latent Trees [77.56794870399288]
We model the production property of context-free grammars for natural and synthetic languages.
We present a dynamic programming algorithm that marginalises over latent binary tree structures with $N$ leaves.
We also present experimental results on German-English translation on the Multi30k dataset.
arXiv Detail & Related papers (2020-10-09T17:47:16Z) - On $\ell_p$-norm Robustness of Ensemble Stumps and Trees [83.81523991945018]
We develop an efficient programming based algorithm for sound verification of ensemble stumps.
We demonstrate the first certified defense method for training ensemble stumps and trees with respect to $ell_p$ norm perturbations.
arXiv Detail & Related papers (2020-08-20T03:42:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.