DiSK: A Diffusion Model for Structured Knowledge
- URL: http://arxiv.org/abs/2312.05253v2
- Date: Wed, 7 Feb 2024 18:59:55 GMT
- Title: DiSK: A Diffusion Model for Structured Knowledge
- Authors: Ouail Kitouni, Niklas Nolte, James Hensman, Bhaskar Mitra
- Abstract summary: Diffusion Models of Structured Knowledge (DiSK) is a new architecture and training approach specialized for structured data.
DiSK handles text, categorical, and continuous numerical data using a Gaussian mixture model approach.
- Score: 12.472921856815942
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Structured (dictionary-like) data presents challenges for left-to-right
language models, as they can struggle with structured entities for a wide
variety of reasons such as formatting and sensitivity to the order in which
attributes are presented. Tabular generative models suffer from a different set
of limitations such as their lack of flexibility. We introduce Diffusion Models
of Structured Knowledge (DiSK) - a new architecture and training approach
specialized for structured data. DiSK handles text, categorical, and continuous
numerical data using a Gaussian mixture model approach, which allows for
improved precision when dealing with numbers. It employs diffusion training to
model relationships between properties. Experiments demonstrate DiSK's
state-of-the-art performance on tabular data modeling, synthesis, and
imputation on over 15 datasets across diverse domains. DiSK provides an
effective inductive bias for generative modeling and manipulation of structured
data. The techniques we propose could open the door to improved knowledge
manipulation in future language models.
Related papers
- Influence Functions for Scalable Data Attribution in Diffusion Models [52.92223039302037]
Diffusion models have led to significant advancements in generative modelling.
Yet their widespread adoption poses challenges regarding data attribution and interpretability.
In this paper, we aim to help address such challenges by developing an textitinfluence functions framework.
arXiv Detail & Related papers (2024-10-17T17:59:02Z) - An improved tabular data generator with VAE-GMM integration [9.4491536689161]
We propose a novel Variational Autoencoder (VAE)-based model that addresses limitations of current approaches.
Inspired by the TVAE model, our approach incorporates a Bayesian Gaussian Mixture model (BGM) within the VAE architecture.
We thoroughly validate our model on three real-world datasets with mixed data types, including two medically relevant ones.
arXiv Detail & Related papers (2024-04-12T12:31:06Z) - On Linearizing Structured Data in Encoder-Decoder Language Models: Insights from Text-to-SQL [8.57550491437633]
This work investigates the linear handling of structured data in encoder-decoder language models, specifically T5.
Our findings reveal the model's ability to mimic human-designed processes such as schema linking and syntax prediction.
We also uncover insights into the model's internal mechanisms, including the ego-centric nature of structure node encodings.
arXiv Detail & Related papers (2024-04-03T01:16:20Z) - Unbiased Learning of Deep Generative Models with Structured Discrete
Representations [7.9057320008285945]
We propose novel algorithms for learning structured variational autoencoders (SVAEs)
We are the first to demonstrate the SVAE's ability to handle multimodal uncertainty when data is missing by incorporating discrete latent variables.
Our memory-efficient implicit differentiation scheme makes the SVAE tractable to learn via gradient descent, while demonstrating robustness to incomplete optimization.
arXiv Detail & Related papers (2023-06-14T03:59:21Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - Dynamic Latent Separation for Deep Learning [67.62190501599176]
A core problem in machine learning is to learn expressive latent variables for model prediction on complex data.
Here, we develop an approach that improves expressiveness, provides partial interpretation, and is not restricted to specific applications.
arXiv Detail & Related papers (2022-10-07T17:56:53Z) - Language Model Cascades [72.18809575261498]
Repeated interactions at test-time with a single model, or the composition of multiple models together, further expands capabilities.
Cases with control flow and dynamic structure require techniques from probabilistic programming.
We formalize several existing techniques from this perspective, including scratchpads / chain of thought, verifiers, STaR, selection-inference, and tool use.
arXiv Detail & Related papers (2022-07-21T07:35:18Z) - Amortized Inference for Causal Structure Learning [72.84105256353801]
Learning causal structure poses a search problem that typically involves evaluating structures using a score or independence test.
We train a variational inference model to predict the causal structure from observational/interventional data.
Our models exhibit robust generalization capabilities under substantial distribution shift.
arXiv Detail & Related papers (2022-05-25T17:37:08Z) - Compositionality as Lexical Symmetry [42.37422271002712]
In tasks like semantic parsing, instruction following, and question answering, standard deep networks fail to generalize compositionally from small datasets.
We present a domain-general and model-agnostic formulation of compositionality as a constraint on symmetries of data distributions rather than models.
We describe a procedure called LEXSYM that discovers these transformations automatically, then applies them to training data for ordinary neural sequence models.
arXiv Detail & Related papers (2022-01-30T21:44:46Z) - Improving Compositional Generalization with Self-Training for
Data-to-Text Generation [36.973617793800315]
We study the compositional generalization of current generation models in data-to-text tasks.
By simulating structural shifts in the compositional Weather dataset, we show that T5 models fail to generalize to unseen structures.
We propose an approach based on self-training using finetuned BLEURT for pseudo-response selection.
arXiv Detail & Related papers (2021-10-16T04:26:56Z) - Closed-form Continuous-Depth Models [99.40335716948101]
Continuous-depth neural models rely on advanced numerical differential equation solvers.
We present a new family of models, termed Closed-form Continuous-depth (CfC) networks, that are simple to describe and at least one order of magnitude faster.
arXiv Detail & Related papers (2021-06-25T22:08:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.