Related papers: Neural Induction of Finite-State Transducers

Neural Induction of Finite-State Transducers

URL: http://arxiv.org/abs/2601.10918v2
Date: Tue, 20 Jan 2026 00:30:38 GMT
Title: Neural Induction of Finite-State Transducers
Authors: Michael Ginn, Alexis Palmer, Mans Hulden,
Abstract summary: We propose a novel method for automatically constructing unweighted Finite-State Transducers (FSTs) following the hidden state geometry learned by a recurrent neural network.<n>We evaluate our methods on real-world datasets for morphological inflection, grapheme-to-phoneme prediction, and historical normalization, showing that the constructed FSTs are highly accurate and robust for many datasets.
Score: 13.274838371184432
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Finite-State Transducers (FSTs) are effective models for string-to-string rewriting tasks, often providing the efficiency necessary for high-performance applications, but constructing transducers by hand is difficult. In this work, we propose a novel method for automatically constructing unweighted FSTs following the hidden state geometry learned by a recurrent neural network. We evaluate our methods on real-world datasets for morphological inflection, grapheme-to-phoneme prediction, and historical normalization, showing that the constructed FSTs are highly accurate and robust for many datasets, substantially outperforming classical transducer learning algorithms by up to 87% accuracy on held-out test sets.

Related papers

Self-Supervised Learning via Flow-Guided Neural Operator on Time-Series Data [57.85958428020496]
Flow-Guided Neural Operator (FGNO) is a novel framework combining operator learning with flow matching for SSL training.<n>FGNO learns mappings in functional spaces by using Short-Time Fourier Transform to unify different time resolutions.<n>Unlike prior generative SSL methods that use noisy inputs during inference, we propose using clean inputs for representation extraction while learning representations with noise.
arXiv Detail & Related papers (2026-02-12T18:54:57Z)
A Comparative Study of Adaptation Strategies for Time Series Foundation Models in Anomaly Detection [0.0]
Time series foundation models (TSFMs) are pretrained on large heterogeneous data.<n>We compare zero-shot inference, full model adaptation, and parameter-efficient fine-tuning strategies.<n>These findings position TSFMs as promising general-purpose models for scalable and efficient time series anomaly detection.
arXiv Detail & Related papers (2026-01-01T19:11:33Z)
WST: Weakly Supervised Transducer for Automatic Speech Recognition [26.373816643181843]
WeaklySupervised Transducer (WST) is designed to robustly handle errors in the transcripts without requiring additional confidence estimation or auxiliary pre-trained models.<n> Empirical evaluations on synthetic and industrial datasets reveal that WST effectively maintains performance even with transcription error rates of up to 70%.
arXiv Detail & Related papers (2025-11-06T04:14:07Z)
Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis [63.66763657191476]
We show that efficient numerical training and inference algorithms as low-rank computation have impressive performance for learning Transformer-based adaption. We analyze how magnitude-based models affect generalization while improving adaption. We conclude that proper magnitude-based has a slight on the testing performance.
arXiv Detail & Related papers (2024-06-24T23:00:58Z)
Enhancing Sequential Model Performance with Squared Sigmoid TanH (SST) Activation Under Data Constraints [1.8624274002327752]
We propose squared Sigmoid TanH (SST) activation specifically tailored to enhance the learning capability of sequential models under data constraints.<n>SST applies mathematical squaring to amplify differences between strong and weak activations as signals propagate over time.<n>We evaluate SST-powered LSTMs and GRUs for diverse applications, such as sign language recognition, regression, and time-series classification tasks.
arXiv Detail & Related papers (2024-02-14T09:20:13Z)
State Sequences Prediction via Fourier Transform for Representation Learning [111.82376793413746]
We propose State Sequences Prediction via Fourier Transform (SPF), a novel method for learning expressive representations efficiently. We theoretically analyze the existence of structural information in state sequences, which is closely related to policy performance and signal regularity. Experiments demonstrate that the proposed method outperforms several state-of-the-art algorithms in terms of both sample efficiency and performance.
arXiv Detail & Related papers (2023-10-24T14:47:02Z)
SIP: Injecting a Structural Inductive Bias into a Seq2Seq Model by Simulation [75.14793516745374]
We show how a structural inductive bias can be efficiently injected into a seq2seq model by pre-training it to simulate structural transformations on synthetic data. Our experiments show that our method imparts the desired inductive bias, resulting in better few-shot learning for FST-like tasks.
arXiv Detail & Related papers (2023-10-01T21:19:12Z)
Label-free timing analysis of SiPM-based modularized detectors with physics-constrained deep learning [9.234802409391111]
We propose a novel method based on deep learning for timing analysis of modularized detectors. We mathematically demonstrate the existence of the optimal function desired by the method, and give a systematic algorithm for training and calibration of the model.
arXiv Detail & Related papers (2023-04-24T09:16:31Z)
Self-Supervised Training with Autoencoders for Visual Anomaly Detection [61.62861063776813]
We focus on a specific use case in anomaly detection where the distribution of normal samples is supported by a lower-dimensional manifold. We adapt a self-supervised learning regime that exploits discriminative information during training but focuses on the submanifold of normal examples. We achieve a new state-of-the-art result on the MVTec AD dataset -- a challenging benchmark for visual anomaly detection in the manufacturing domain.
arXiv Detail & Related papers (2022-06-23T14:16:30Z)
Semantic Perturbations with Normalizing Flows for Improved Generalization [62.998818375912506]
We show that perturbations in the latent space can be used to define fully unsupervised data augmentations. We find that our latent adversarial perturbations adaptive to the classifier throughout its training are most effective.
arXiv Detail & Related papers (2021-08-18T03:20:00Z)
Latent Template Induction with Gumbel-CRFs [107.17408593510372]
We explore the use of structured variational autoencoders to infer latent templates for sentence generation. As a structured inference network, we show that it learns interpretable templates during training.
arXiv Detail & Related papers (2020-11-29T01:00:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.