Related papers: TreeGPT: Pure TreeFFN Encoder-Decoder Architecture for Structured Reasoning Without Attention Mechanisms

TreeGPT: Pure TreeFFN Encoder-Decoder Architecture for Structured Reasoning Without Attention Mechanisms

URL: http://arxiv.org/abs/2509.05550v2
Date: Thu, 11 Sep 2025 10:46:29 GMT
Title: TreeGPT: Pure TreeFFN Encoder-Decoder Architecture for Structured Reasoning Without Attention Mechanisms
Authors: Zixi Li,
Abstract summary: TreeGPT is an attention-free neural architecture that explores the potential of pure TreeFFN encoder-decoder design for structured reasoning tasks.<n>We evaluate our approach on the ARC Prize 2025 dataset, where TreeGPT achieves 99% accuracy using 3.16M parameters.
Score: 0.16244541005112745
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present TreeGPT, an attention-free neural architecture that explores the potential of pure TreeFFN encoder-decoder design for structured reasoning tasks. Unlike traditional transformer approaches that rely on attention mechanisms, TreeGPT employs bidirectional TreeFFN components that process sequences through adjacent connections in parallel, aiming to achieve computational efficiency while maintaining reasoning capabilities. Our approach centers on a TreeFFN Encoder-Decoder mechanism: $$\text{Encoder TreeFFN (L} \rightarrow \text{R)} + \text{Decoder TreeFFN (R} \leftarrow \text{L)} \rightarrow \text{Parallel Processing}$$ where the encoder processes left-to-right dependencies while the decoder handles right-to-left patterns, both using simple neighbor-to-neighbor connections. This design eliminates attention computation while maintaining sequence modeling capabilities. We evaluate our approach on the ARC Prize 2025 dataset, where TreeGPT achieves 99\% validation accuracy using 3.16M parameters. The model converges within 1500 training steps and demonstrates 100\% token-level accuracy on selected evaluation samples. Our preliminary results suggest that for certain structured reasoning tasks, specialized TreeFFN architectures may offer advantages over attention-based approaches. While these findings are encouraging, we acknowledge that further investigation across diverse tasks and datasets would be valuable to establish the broader applicability of attention-free designs.

Related papers

TorchLean: Formalizing Neural Networks in Lean [71.68907600404513]
We introduce TorchLean, a framework that treats learned models as first-class mathematical objects with a single, precise semantics shared by execution and verification.<n>We validate TorchLean end-to-end on certified robustness, physics-informed residual bounds for PINNs, and Lyapunov-style neural controller verification.
arXiv Detail & Related papers (2026-02-26T05:11:44Z)
TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling [65.46347858249295]
TreePO is a self-guided rollout algorithm that views sequence generation as a tree-structured searching process.<n>TreePO essentially reduces the per-update compute burden while preserving or enhancing exploration diversity.
arXiv Detail & Related papers (2025-08-24T16:52:37Z)
A Novel Architecture for Symbolic Reasoning with Decision Trees and LLM Agents [0.0]
We propose a hybrid architecture that integrates decision tree-based symbolic reasoning with the generative capabilities of large language models.<n>Tree-based modules enable interpretable rule inference and causal logic, while LLM agents handle abductive reasoning, generalization, and interactive planning.<n>System achieves strong performance on reasoning benchmarks.
arXiv Detail & Related papers (2025-08-07T12:11:53Z)
TreeLoRA: Efficient Continual Learning via Layer-Wise LoRAs Guided by a Hierarchical Gradient-Similarity Tree [52.44403214958304]
In this paper, we introduce TreeLoRA, a novel approach that constructs layer-wise adapters by leveraging hierarchical gradient similarity.<n>To reduce the computational burden of task similarity estimation, we employ bandit techniques to develop an algorithm based on lower confidence bounds.<n> experiments on both vision transformers (ViTs) and large language models (LLMs) demonstrate the effectiveness and efficiency of our approach.
arXiv Detail & Related papers (2025-06-12T05:25:35Z)
Birch SGD: A Tree Graph Framework for Local and Asynchronous SGD Methods [51.54704494242525]
We propose a new unifying framework, Birch SGD, for analyzing and designing distributed SGD methods.<n>Using Birch SGD, we design eight new methods and analyze them alongside previously known ones, with at least six of the new methods shown to have optimal computational time complexity.<n>Our research leads to two key insights: (i) all methods share the same "iteration rate" of $Oleft(frac(R + 1) L Deltavarepsilon + fracsigma2 L Deltavarepsilon2right)$, where $R$
arXiv Detail & Related papers (2025-05-14T08:37:45Z)
ReTreever: Tree-based Coarse-to-Fine Representations for Retrieval [64.44265315244579]
We propose a tree-based method for organizing and representing reference documents at various granular levels.<n>Our method, called ReTreever, jointly learns a routing function per internal node of a binary tree such that query and reference documents are assigned to similar tree branches.<n>Our evaluations show that ReTreever generally preserves full representation accuracy.
arXiv Detail & Related papers (2025-02-11T21:35:13Z)
Tree-NET: Enhancing Medical Image Segmentation Through Efficient Low-Level Feature Training [0.0]
This paper introduces Tree-NET, a novel framework for medical image segmentation.<n>Tree-NET uses bottleneck feature supervision to enhance both segmentation accuracy and computational efficiency.<n> Experimental results demonstrate that Tree-NET reduces FLOPs by a factor of 4 to 13 and decreases memory usage, while achieving comparable or superior accuracy compared to the original.
arXiv Detail & Related papers (2025-01-03T23:17:01Z)
Heterogeneous Graph Neural Network on Semantic Tree [11.810900066591861]
HetTree is a novel HGNN that models both the graph structure and heterogeneous aspects in a scalable and effective manner.<n>To effectively encode the semantic tree, HetTree uses a novel subtree attention mechanism to emphasize metapaths that are more helpful in encoding parent-child relationships.<n>Our evaluation of HetTree on a variety of real-world datasets demonstrates that it outperforms all existing baselines on open benchmarks.
arXiv Detail & Related papers (2024-02-21T03:14:45Z)
Wasserstein Auto-Encoders of Merge Trees (and Persistence Diagrams) [5.384630221560809]
This paper presents a computational framework for the Wasserstein auto-encoding of merge trees (MT-WAE) In contrast to traditional auto-encoders which operate on vectorized data, our formulation explicitly manipulates merge trees on their associated metric space at each layer of the network. Experiments on public ensembles demonstrate the efficiency of our algorithms, with MT-WAE computations in the orders of minutes on average.
arXiv Detail & Related papers (2023-07-05T09:46:52Z)
Dynamic Perceiver for Efficient Visual Recognition [87.08210214417309]
We propose Dynamic Perceiver (Dyn-Perceiver) to decouple the feature extraction procedure and the early classification task. A feature branch serves to extract image features, while a classification branch processes a latent code assigned for classification tasks. Early exits are placed exclusively within the classification branch, thus eliminating the need for linear separability in low-level features.
arXiv Detail & Related papers (2023-06-20T03:00:22Z)
UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation [93.88170217725805]
We propose a 3D medical image segmentation approach, named UNETR++, that offers both high-quality segmentation masks as well as efficiency in terms of parameters, compute cost, and inference speed. The core of our design is the introduction of a novel efficient paired attention (EPA) block that efficiently learns spatial and channel-wise discriminative features. Our evaluations on five benchmarks, Synapse, BTCV, ACDC, BRaTs, and Decathlon-Lung, reveal the effectiveness of our contributions in terms of both efficiency and accuracy.
arXiv Detail & Related papers (2022-12-08T18:59:57Z)
Individualized and Global Feature Attributions for Gradient Boosted Trees in the Presence of $\ell_2$ Regularization [0.0]
We propose Prediction Decomposition (PreDecomp), a novel individualized feature attribution for boosted trees when they are trained with $ell$ regularization. We also propose TreeInner, a family of debiased global feature attributions defined in terms of the inner product between any individualized feature attribution and labels on out-sample data for each tree.
arXiv Detail & Related papers (2022-11-08T17:56:22Z)
Learning Tree Structures from Leaves For Particle Decay Reconstruction [0.0]
We present a neural approach to reconstructing rooted tree graphs describing hierarchical interactions, using a novel representation we term the Lowest Common Ancestor Generations (LCAG) matrix. We are able to correctly predict the LCAG purely from leaf features for a maximum tree-depth of $8$ in $92.5%$ of cases for trees up to $6$ leaves (including) and $59.7%$ for trees up to $10$ in our simulated dataset.
arXiv Detail & Related papers (2022-08-31T15:36:47Z)
PointTree: Transformation-Robust Point Cloud Encoder with Relaxed K-D Trees [27.641101804012152]
We propose PointTree, a point cloud encoder that is robust to transformations based on relaxed K-D trees. Key to our approach is the design of the division rule in K-D trees by using principal component analysis (PCA) In addition to this novel architecture design, we further improve the introducing by pre-alignment.
arXiv Detail & Related papers (2022-08-11T17:59:09Z)
Tree Decomposition Attention for AMR-to-Text Generation [12.342043849587613]
We use a graph's tree decomposition to constrain self-attention in a graph. We apply dynamic programming to derive a forest of tree decompositions, choosing the most structurally similar tree to the AMR. Our system outperforms a self-attentive baseline by 1.6 BLEU and 1.8 chrF++.
arXiv Detail & Related papers (2021-08-27T14:24:25Z)
Visualizing hierarchies in scRNA-seq data using a density tree-biased autoencoder [50.591267188664666]
We propose an approach for identifying a meaningful tree structure from high-dimensional scRNA-seq data. We then introduce DTAE, a tree-biased autoencoder that emphasizes the tree structure of the data in low dimensional space.
arXiv Detail & Related papers (2021-02-11T08:48:48Z)
SGA: A Robust Algorithm for Partial Recovery of Tree-Structured Graphical Models with Noisy Samples [75.32013242448151]
We consider learning Ising tree models when the observations from the nodes are corrupted by independent but non-identically distributed noise. Katiyar et al. (2020) showed that although the exact tree structure cannot be recovered, one can recover a partial tree structure. We propose Symmetrized Geometric Averaging (SGA), a more statistically robust algorithm for partial tree recovery.
arXiv Detail & Related papers (2021-01-22T01:57:35Z)
Growing Deep Forests Efficiently with Soft Routing and Learned Connectivity [79.83903179393164]
This paper further extends the deep forest idea in several important aspects. We employ a probabilistic tree whose nodes make probabilistic routing decisions, a.k.a., soft routing, rather than hard binary decisions. Experiments on the MNIST dataset demonstrate that our empowered deep forests can achieve better or comparable performance than [1],[3].
arXiv Detail & Related papers (2020-12-29T18:05:05Z)
Nested Named Entity Recognition with Partially-Observed TreeCRFs [23.992944831013013]
We view nested NER as constituency parsing with partially-observed trees and model it with partially-observed TreeCRFs. Our approach achieves the state-of-the-art (SOTA) F1 scores on the ACE2004, ACE2005 dataset, and shows comparable performance to SOTA models on the GENIA dataset.
arXiv Detail & Related papers (2020-12-15T18:20:36Z)
Rethinking Learnable Tree Filter for Generic Feature Transform [71.77463476808585]
Learnable Tree Filter presents a remarkable approach to model structure-preserving relations for semantic segmentation. To relax the geometric constraint, we give the analysis by reformulating it as a Markov Random Field and introduce a learnable unary term. For semantic segmentation, we achieve leading performance (82.1% mIoU) on the Cityscapes benchmark without bells-and-whistles.
arXiv Detail & Related papers (2020-12-07T07:16:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.