Related papers: On Linearizing Structured Data in Encoder-Decoder Language Models: Insights from Text-to-SQL

On Linearizing Structured Data in Encoder-Decoder Language Models: Insights from Text-to-SQL

URL: http://arxiv.org/abs/2404.02389v1
Date: Wed, 3 Apr 2024 01:16:20 GMT
Title: On Linearizing Structured Data in Encoder-Decoder Language Models: Insights from Text-to-SQL
Authors: Yutong Shao, Ndapa Nakashole,
Abstract summary: This work investigates the linear handling of structured data in encoder-decoder language models, specifically T5. Our findings reveal the model's ability to mimic human-designed processes such as schema linking and syntax prediction. We also uncover insights into the model's internal mechanisms, including the ego-centric nature of structure node encodings.
Score: 8.57550491437633
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Structured data, prevalent in tables, databases, and knowledge graphs, poses a significant challenge in its representation. With the advent of large language models (LLMs), there has been a shift towards linearization-based methods, which process structured data as sequential token streams, diverging from approaches that explicitly model structure, often as a graph. Crucially, there remains a gap in our understanding of how these linearization-based methods handle structured data, which is inherently non-linear. This work investigates the linear handling of structured data in encoder-decoder language models, specifically T5. Our findings reveal the model's ability to mimic human-designed processes such as schema linking and syntax prediction, indicating a deep, meaningful learning of structure beyond simple token sequencing. We also uncover insights into the model's internal mechanisms, including the ego-centric nature of structure node encodings and the potential for model compression due to modality fusion redundancy. Overall, this work sheds light on the inner workings of linearization-based methods and could potentially provide guidance for future research.

Related papers

Patterning: The Dual of Interpretability [2.3443925855637073]
We show that patterning can select which algorithm the model learns by targeting the local learning coefficient of each solution.<n>Results establish that the same mathematical framework used to read internal structure can be inverted to write it.
arXiv Detail & Related papers (2026-01-20T03:15:27Z)
Innovative tokenisation of structured data for LLM training [0.0]
This paper introduces a novel, hybrid tokenisation methodology to convert structured data into a sequential format suitable for training Large Language Models (LLMs)<n>We show that our method is highly efficient, processing over 31 million network flows in under five hours and achieving a significant data compression ratio of 6.18:1.<n>This process resulted in a computationally manageable corpus of over one billion tokens, establishing a viable and generalisable pathway for training foundation models on structured data.
arXiv Detail & Related papers (2025-08-03T09:29:50Z)
Relational Deep Learning: Challenges, Foundations and Next-Generation Architectures [50.46688111973999]
Graph machine learning has led to a significant increase in the capabilities of models that learn on arbitrary graph-structured data.<n>We present a new blueprint that enables end-to-end representation of'relational entity graphs' without traditional engineering feature.<n>We discuss key challenges including large-scale multi-table integration and the complexities of modeling temporal dynamics and heterogeneous data.
arXiv Detail & Related papers (2025-06-19T23:51:38Z)
Scaling Laws and Representation Learning in Simple Hierarchical Languages: Transformers vs. Convolutional Architectures [49.19753720526998]
We derive theoretical scaling laws for neural network performance on synthetic datasets.<n>We validate that convolutional networks, whose structure aligns with that of the generative process through locality and weight sharing, enjoy a faster scaling of performance.<n>This finding clarifies the architectural biases underlying neural scaling laws and highlights how representation learning is shaped by the interaction between model architecture and the statistical properties of data.
arXiv Detail & Related papers (2025-05-11T17:44:14Z)
Concept Factorization via Self-Representation and Adaptive Graph Structure Learning [8.990462532663871]
We propose a Concept Factorization Based on Self-Representation and Adaptive Graph Structure Learning (CFSRAG) Model.<n>CFSRAG learns the affinity relationship between data through a self-representation method, and uses the learned affinity matrix to implement dynamic graph regularization constraints.<n>The results show that our model outperforms other state-of-the-art models.
arXiv Detail & Related papers (2025-05-06T10:12:59Z)
Knowledge prompt chaining for semantic modeling [0.0]
We propose a novel automatic semantic modeling framework: Knowledge Prompt Chaining. It canserialize the graph-structured knowledge and inject it into the LLMs properly. Based on experimental results, our method achieves better performance than existing leading techniques.
arXiv Detail & Related papers (2025-01-15T03:00:57Z)
Dissecting embedding method: learning higher-order structures from data [0.0]
Geometric deep learning methods for data learning often include set of assumptions on the geometry of the feature space. These assumptions together with data being discrete and finite can cause some generalisations, which are likely to create wrong interpretations of the data and models outputs.
arXiv Detail & Related papers (2024-10-14T08:19:39Z)
Induced Covariance for Causal Discovery in Linear Sparse Structures [55.2480439325792]
Causal models seek to unravel the cause-effect relationships among variables from observed data. This paper introduces a novel causal discovery algorithm designed for settings in which variables exhibit linearly sparse relationships.
arXiv Detail & Related papers (2024-10-02T04:01:38Z)
Structured Language Generation Model for Robust Structure Prediction [6.4736137270915215]
We propose a framework that reduces sequence-to-sequence problems to classification problems via methodologies in loss calibration and decoding method. Our experimental results show that SLGM is able to maintain performance without explicit dataset information, follow and potentially replace dataset-specific fine-tuning.
arXiv Detail & Related papers (2024-02-14T06:33:22Z)
DiSK: A Diffusion Model for Structured Knowledge [12.472921856815942]
Diffusion Models of Structured Knowledge (DiSK) is a new architecture and training approach specialized for structured data. DiSK handles text, categorical, and continuous numerical data using a Gaussian mixture model approach.
arXiv Detail & Related papers (2023-12-08T18:59:14Z)
Disentanglement via Latent Quantization [60.37109712033694]
In this work, we construct an inductive bias towards encoding to and decoding from an organized latent space. We demonstrate the broad applicability of this approach by adding it to both basic data-re (vanilla autoencoder) and latent-reconstructing (InfoGAN) generative models.
arXiv Detail & Related papers (2023-05-28T06:30:29Z)
Physics of Language Models: Part 1, Learning Hierarchical Language Structures [51.68385617116854]
Transformer-based language models are effective but complex, and understanding their inner workings is a significant challenge. We introduce a family of synthetic CFGs that produce hierarchical rules, capable of generating lengthy sentences. We demonstrate that generative models like GPT can accurately learn this CFG language and generate sentences based on it.
arXiv Detail & Related papers (2023-05-23T04:28:16Z)
Autoregressive Structured Prediction with Language Models [73.11519625765301]
We describe an approach to model structures as sequences of actions in an autoregressive manner with PLMs. Our approach achieves the new state-of-the-art on all the structured prediction tasks we looked at.
arXiv Detail & Related papers (2022-10-26T13:27:26Z)
Schema-aware Reference as Prompt Improves Data-Efficient Knowledge Graph Construction [57.854498238624366]
We propose a retrieval-augmented approach, which retrieves schema-aware Reference As Prompt (RAP) for data-efficient knowledge graph construction. RAP can dynamically leverage schema and knowledge inherited from human-annotated and weak-supervised data as a prompt for each sample.
arXiv Detail & Related papers (2022-10-19T16:40:28Z)
Improving Compositional Generalization with Self-Training for Data-to-Text Generation [36.973617793800315]
We study the compositional generalization of current generation models in data-to-text tasks. By simulating structural shifts in the compositional Weather dataset, we show that T5 models fail to generalize to unseen structures. We propose an approach based on self-training using finetuned BLEURT for pseudo-response selection.
arXiv Detail & Related papers (2021-10-16T04:26:56Z)
Structural Adapters in Pretrained Language Models for AMR-to-text Generation [59.50420985074769]
Previous work on text generation from graph-structured data relies on pretrained language models (PLMs) We propose StructAdapt, an adapter method to encode graph structure into PLMs.
arXiv Detail & Related papers (2021-03-16T15:06:50Z)
Variational Autoencoder with Learned Latent Structure [4.41370484305827]
We introduce the Variational Autoencoder with Learned Latent Structure (VAELLS) VAELLS incorporates a learnable manifold model into the latent space of a VAE. We validate our model on examples with known latent structure and also demonstrate its capabilities on a real-world dataset.
arXiv Detail & Related papers (2020-06-18T14:59:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.