ORIGAMI: A generative transformer architecture for predictions from semi-structured data
- URL: http://arxiv.org/abs/2412.17348v1
- Date: Mon, 23 Dec 2024 07:21:17 GMT
- Title: ORIGAMI: A generative transformer architecture for predictions from semi-structured data
- Authors: Thomas Rückstieß, Alana Huang, Robin Vujanic,
- Abstract summary: ORIGAMI is a transformer-based architecture that processes nested key/value pairs.
By reformulating classification as next-token prediction, ORIGAMI naturally handles both single-label and multi-label tasks.
- Score: 3.5639148953570836
- License:
- Abstract: Despite the popularity and widespread use of semi-structured data formats such as JSON, end-to-end supervised learning applied directly to such data remains underexplored. We present ORIGAMI (Object RepresentatIon via Generative Autoregressive ModellIng), a transformer-based architecture that directly processes nested key/value pairs while preserving their hierarchical semantics. Our key technical contributions include: (1) a structure-preserving tokenizer, (2) a novel key/value position encoding scheme, and (3) a grammar-constrained training and inference framework that ensures valid outputs and accelerates training convergence. These enhancements enable efficient end-to-end modeling of semi-structured data. By reformulating classification as next-token prediction, ORIGAMI naturally handles both single-label and multi-label tasks without architectural modifications. Empirical evaluation across diverse domains demonstrates ORIGAMI's effectiveness: On standard tabular benchmarks converted to JSON, ORIGAMI remains competitive with classical and state-of-the-art approaches. On native JSON datasets, we outperform baselines on multi-label classification and specialized models such as convolutional and graph neural networks on a code classification task. Through extensive ablation studies, we validate the impact of each architectural component and establish ORIGAMI as a robust framework for end-to-end learning on semi-structured data.
Related papers
- EBES: Easy Benchmarking for Event Sequences [17.277513178760348]
Event sequences are common data structures in various real-world domains such as healthcare, finance, and user interaction logs.
Despite advances in temporal data modeling techniques, there is no standardized benchmarks for evaluating their performance on event sequences.
We introduce EBES, a comprehensive benchmarking tool with standardized evaluation scenarios and protocols.
arXiv Detail & Related papers (2024-10-04T13:03:43Z) - A Pure Transformer Pretraining Framework on Text-attributed Graphs [50.833130854272774]
We introduce a feature-centric pretraining perspective by treating graph structure as a prior.
Our framework, Graph Sequence Pretraining with Transformer (GSPT), samples node contexts through random walks.
GSPT can be easily adapted to both node classification and link prediction, demonstrating promising empirical success on various datasets.
arXiv Detail & Related papers (2024-06-19T22:30:08Z) - Fake It Till Make It: Federated Learning with Consensus-Oriented
Generation [52.82176415223988]
We propose federated learning with consensus-oriented generation (FedCOG)
FedCOG consists of two key components at the client side: complementary data generation and knowledge-distillation-based model training.
Experiments on classical and real-world FL datasets show that FedCOG consistently outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-12-10T18:49:59Z) - Learning from Mistakes: Self-Regularizing Hierarchical Representations
in Point Cloud Semantic Segmentation [15.353256018248103]
LiDAR semantic segmentation has gained attention to accomplish fine-grained scene understanding.
We present a coarse-to-fine setup that LEArns from classification mistaKes (LEAK) derived from a standard model.
Our LEAK approach is very general and can be seamlessly applied on top of any segmentation architecture.
arXiv Detail & Related papers (2023-01-26T14:52:30Z) - AutoBERT-Zero: Evolving BERT Backbone from Scratch [94.89102524181986]
We propose an Operation-Priority Neural Architecture Search (OP-NAS) algorithm to automatically search for promising hybrid backbone architectures.
We optimize both the search algorithm and evaluation of candidate models to boost the efficiency of our proposed OP-NAS.
Experiments show that the searched architecture (named AutoBERT-Zero) significantly outperforms BERT and its variants of different model capacities in various downstream tasks.
arXiv Detail & Related papers (2021-07-15T16:46:01Z) - Understanding Dynamics of Nonlinear Representation Learning and Its
Application [12.697842097171119]
We study the dynamics of implicit nonlinear representation learning.
We show that the data-architecture alignment condition is sufficient for the global convergence.
We derive a new training framework, which satisfies the data-architecture alignment condition without assuming it.
arXiv Detail & Related papers (2021-06-28T16:31:30Z) - Rethinking Architecture Design for Tackling Data Heterogeneity in
Federated Learning [53.73083199055093]
We show that attention-based architectures (e.g., Transformers) are fairly robust to distribution shifts.
Our experiments show that replacing convolutional networks with Transformers can greatly reduce catastrophic forgetting of previous devices.
arXiv Detail & Related papers (2021-06-10T21:04:18Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z) - Distribution-Based Invariant Deep Networks for Learning Meta-Features [2.179313476241343]
Recent advances in deep learning from probability distributions successfully achieve classification or regression from distribution samples, thus invariant under permutation of the samples.
The proposed architecture, called Dida, inherits the NN properties of universal approximation, and its robustness w.r.t. Lipschitz-bounded transformations of the input distribution are established.
The paper empirically and comparatively demonstrate the merits of the approach on two tasks defined at the dataset level.
arXiv Detail & Related papers (2020-06-24T13:15:50Z) - Document Ranking with a Pretrained Sequence-to-Sequence Model [56.44269917346376]
We show how a sequence-to-sequence model can be trained to generate relevance labels as "target words"
Our approach significantly outperforms an encoder-only model in a data-poor regime.
arXiv Detail & Related papers (2020-03-14T22:29:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.