Pairing Orthographically Variant Literary Words to Standard Equivalents
Using Neural Edit Distance Models
- URL: http://arxiv.org/abs/2401.15068v1
- Date: Fri, 26 Jan 2024 18:49:34 GMT
- Title: Pairing Orthographically Variant Literary Words to Standard Equivalents
Using Neural Edit Distance Models
- Authors: Craig Messner and Tom Lippincott
- Abstract summary: We present a novel corpus consisting of orthographically variant words found in works of 19th century U.S. literature annotated with their corresponding "standard" word pair.
We train a set of neural edit distance models to pair these variants with their standard forms, and compare the performance of these models to the performance of a set of neural edit distance models trained on a corpus of orthographic errors made by L2 English learners.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a novel corpus consisting of orthographically variant words found
in works of 19th century U.S. literature annotated with their corresponding
"standard" word pair. We train a set of neural edit distance models to pair
these variants with their standard forms, and compare the performance of these
models to the performance of a set of neural edit distance models trained on a
corpus of orthographic errors made by L2 English learners. Finally, we analyze
the relative performance of these models in the light of different negative
training sample generation strategies, and offer concluding remarks on the
unique challenge literary orthographic variation poses to string pairing
methodologies.
Related papers
- Modeling Orthographic Variation in Occitan's Dialects [3.038642416291856]
Large multilingual models minimize the need for spelling normalization during pre-processing.
Our findings suggest that large multilingual models minimize the need for spelling normalization during pre-processing.
arXiv Detail & Related papers (2024-04-30T07:33:51Z) - The Scenario Refiner: Grounding subjects in images at the morphological
level [2.401993998791928]
We ask whether Vision and Language (V&L) models capture such distinctions at the morphological level.
We compare the results from V&L models to human judgements and find that models' predictions differ from those of human participants.
arXiv Detail & Related papers (2023-09-20T12:23:06Z) - Visual Comparison of Language Model Adaptation [55.92129223662381]
adapters are lightweight alternatives for model adaptation.
In this paper, we discuss several design and alternatives for interactive, comparative visual explanation methods.
We show that, for instance, an adapter trained on the language debiasing task according to context-0 embeddings introduces a new type of bias.
arXiv Detail & Related papers (2022-08-17T09:25:28Z) - Comparative Error Analysis in Neural and Finite-state Models for
Unsupervised Character-level Transduction [34.1177259741046]
We compare the two model classes side by side and find that they tend to make different types of errors even when achieving comparable performance.
We investigate how combining finite-state and sequence-to-sequence models at decoding time affects the output quantitatively and qualitatively.
arXiv Detail & Related papers (2021-06-24T00:09:24Z) - SLUA: A Super Lightweight Unsupervised Word Alignment Model via
Cross-Lingual Contrastive Learning [79.91678610678885]
We propose a super lightweight unsupervised word alignment model (SLUA)
Experimental results on several public benchmarks demonstrate that our model achieves competitive, if not better, performance.
Notably, we recognize our model as a pioneer attempt to unify bilingual word embedding and word alignments.
arXiv Detail & Related papers (2021-02-08T05:54:11Z) - Morphologically Aware Word-Level Translation [82.59379608647147]
We propose a novel morphologically aware probability model for bilingual lexicon induction.
Our model exploits the basic linguistic intuition that the lexeme is the key lexical unit of meaning.
arXiv Detail & Related papers (2020-11-15T17:54:49Z) - SLM: Learning a Discourse Language Representation with Sentence
Unshuffling [53.42814722621715]
We introduce Sentence-level Language Modeling, a new pre-training objective for learning a discourse language representation.
We show that this feature of our model improves the performance of the original BERT by large margins.
arXiv Detail & Related papers (2020-10-30T13:33:41Z) - Word Shape Matters: Robust Machine Translation with Visual Embedding [78.96234298075389]
We introduce a new encoding of the input symbols for character-level NLP models.
It encodes the shape of each character through the images depicting the letters when printed.
We name this new strategy visual embedding and it is expected to improve the robustness of NLP models.
arXiv Detail & Related papers (2020-10-20T04:08:03Z) - Neural Baselines for Word Alignment [0.0]
We study and evaluate neural models for unsupervised word alignment for four language pairs.
We show that neural versions of the IBM-1 and hidden Markov models vastly outperform their discrete counterparts.
arXiv Detail & Related papers (2020-09-28T07:51:03Z) - Grounded Compositional Outputs for Adaptive Language Modeling [59.02706635250856]
A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size.
We propose a fully compositional output embedding layer for language models.
To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
arXiv Detail & Related papers (2020-09-24T07:21:14Z) - Overestimation of Syntactic Representationin Neural Language Models [16.765097098482286]
One popular method for determining a model's ability to induce syntactic structure trains a model on strings generated according to a template then tests the model's ability to distinguish such strings from superficially similar ones with different syntax.
We illustrate a fundamental problem with this approach by reproducing positive results from a recent paper with two non-syntactic baseline language models.
arXiv Detail & Related papers (2020-04-10T15:13:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.