Structure-Aware Path Inference for Neural Finite State Transducers
- URL: http://arxiv.org/abs/2312.13614v1
- Date: Thu, 21 Dec 2023 07:03:15 GMT
- Title: Structure-Aware Path Inference for Neural Finite State Transducers
- Authors: Weiting Tan, Chu-cheng Lin, Jason Eisner
- Abstract summary: Neural finite-state transducers (NFSTs) form an expressive family of neurosymbolic sequence models.
We focus on the resulting challenge of imputing the latent alignment path that explains a given pair of input and output strings.
- Score: 22.385573671312475
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural finite-state transducers (NFSTs) form an expressive family of
neurosymbolic sequence transduction models. An NFST models each string pair as
having been generated by a latent path in a finite-state transducer. As they
are deep generative models, both training and inference of NFSTs require
inference networks that approximate posterior distributions over such latent
variables. In this paper, we focus on the resulting challenge of imputing the
latent alignment path that explains a given pair of input and output strings
(e.g., during training). We train three autoregressive approximate models for
amortized inference of the path, which can then be used as proposal
distributions for importance sampling. All three models perform lookahead. Our
most sophisticated (and novel) model leverages the FST structure to consider
the graph of future paths; unfortunately, we find that it loses out to the
simpler approaches -- except on an artificial task that we concocted to confuse
the simpler approaches.
Related papers
- Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion [61.03681839276652]
Diffusion Forcing is a new training paradigm where a diffusion model is trained to denoise a set of tokens with independent per-token noise levels.
We apply Diffusion Forcing to sequence generative modeling by training a causal next-token prediction model to generate one or several future tokens.
arXiv Detail & Related papers (2024-07-01T15:43:25Z) - Tilt your Head: Activating the Hidden Spatial-Invariance of Classifiers [0.7704032792820767]
Deep neural networks are applied in more and more areas of everyday life.
They still lack essential abilities, such as robustly dealing with spatially transformed input signals.
We propose a novel technique to emulate such an inference process for neural nets.
arXiv Detail & Related papers (2024-05-06T09:47:29Z) - SIP: Injecting a Structural Inductive Bias into a Seq2Seq Model by Simulation [75.14793516745374]
We show how a structural inductive bias can be efficiently injected into a seq2seq model by pre-training it to simulate structural transformations on synthetic data.
Our experiments show that our method imparts the desired inductive bias, resulting in better few-shot learning for FST-like tasks.
arXiv Detail & Related papers (2023-10-01T21:19:12Z) - Bayesian Flow Networks [4.585102332532472]
This paper introduces Bayesian Flow Networks (BFNs), a new class of generative model in which the parameters of a set of independent distributions are modified with Bayesian inference.
Starting from a simple prior and iteratively updating the two distributions yields a generative procedure similar to the reverse process of diffusion models.
BFNs achieve competitive log-likelihoods for image modelling on dynamically binarized MNIST and CIFAR-10, and outperform all known discrete diffusion models on the text8 character-level language modelling task.
arXiv Detail & Related papers (2023-08-14T09:56:35Z) - Minimally-Supervised Speech Synthesis with Conditional Diffusion Model
and Language Model: A Comparative Study of Semantic Coding [57.42429912884543]
We propose Diff-LM-Speech, Tetra-Diff-Speech and Tri-Diff-Speech to solve high dimensionality and waveform distortion problems.
We also introduce a prompt encoder structure based on a variational autoencoder and a prosody bottleneck to improve prompt representation ability.
Experimental results show that our proposed methods outperform baseline methods.
arXiv Detail & Related papers (2023-07-28T11:20:23Z) - Reflected Diffusion Models [93.26107023470979]
We present Reflected Diffusion Models, which reverse a reflected differential equation evolving on the support of the data.
Our approach learns the score function through a generalized score matching loss and extends key components of standard diffusion models.
arXiv Detail & Related papers (2023-04-10T17:54:38Z) - DiffSTG: Probabilistic Spatio-Temporal Graph Forecasting with Denoising
Diffusion Models [53.67562579184457]
This paper focuses on probabilistic STG forecasting, which is challenging due to the difficulty in modeling uncertainties and complex dependencies.
We present the first attempt to generalize the popular denoising diffusion models to STGs, leading to a novel non-autoregressive framework called DiffSTG.
Our approach combines the intrinsic-temporal learning capabilities STNNs with the uncertainty measurements of diffusion models.
arXiv Detail & Related papers (2023-01-31T13:42:36Z) - Truncated tensor Schatten p-norm based approach for spatiotemporal
traffic data imputation with complicated missing patterns [77.34726150561087]
We introduce four complicated missing patterns, including missing and three fiber-like missing cases according to the mode-drivenn fibers.
Despite nonity of the objective function in our model, we derive the optimal solutions by integrating alternating data-mputation method of multipliers.
arXiv Detail & Related papers (2022-05-19T08:37:56Z) - Generative Text Modeling through Short Run Inference [47.73892773331617]
The present work proposes a short run dynamics for inference. It is variation from the prior distribution of the latent variable and then runs a small number of Langevin dynamics steps guided by its posterior distribution.
We show that the models trained with short run dynamics more accurately model the data, compared to strong language model and VAE baselines, and exhibit no sign of posterior collapse.
arXiv Detail & Related papers (2021-05-27T09:14:35Z) - Explainable Deep RDFS Reasoner [6.795420391707213]
We introduce a new neural network model that gets the input graph--as a sequence of graph words--as well as the encoding of the inferred triple and outputs the derivation for the inferred triple.
We evaluate our justification model on two datasets: a synthetic dataset and a real-world dataset.
arXiv Detail & Related papers (2020-02-10T03:20:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.