The NYU-CUBoulder Systems for SIGMORPHON 2020 Task 0 and Task 2
- URL: http://arxiv.org/abs/2006.11830v1
- Date: Sun, 21 Jun 2020 15:41:58 GMT
- Title: The NYU-CUBoulder Systems for SIGMORPHON 2020 Task 0 and Task 2
- Authors: Assaf Singer and Katharina Kann
- Abstract summary: We describe the NYU-CUBoulder systems for the SIGMORPHON 2020 Task 0 on typologically diverse morphological inflection and Task 2 on unsupervised morphological paradigm completion.
The former consists of generating morphological inflections from a lemma and a set of morphosyntactic features describing the target form.
The latter requires generating entire paradigms for a set of given lemmas from raw text alone.
- Score: 25.234256237085336
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We describe the NYU-CUBoulder systems for the SIGMORPHON 2020 Task 0 on
typologically diverse morphological inflection and Task 2 on unsupervised
morphological paradigm completion. The former consists of generating
morphological inflections from a lemma and a set of morphosyntactic features
describing the target form. The latter requires generating entire paradigms for
a set of given lemmas from raw text alone. We model morphological inflection as
a sequence-to-sequence problem, where the input is the sequence of the lemma's
characters with morphological tags, and the output is the sequence of the
inflected form's characters. First, we apply a transformer model to the task.
Second, as inflected forms share most characters with the lemma, we further
propose a pointer-generator transformer model to allow easy copying of input
characters. Our best performing system for Task 0 is placed 6th out of 23
systems. We further use our inflection systems as subcomponents of approaches
for Task 2. Our best performing system for Task 2 is the 2nd best out of 7
submissions.
Related papers
- Text-Driven Diffusion Model for Sign Language Production [13.671593137551268]
We introduce the hfut-lmc team's solution to the SLRTP Sign Production Challenge.
The challenge aims to generate semantically aligned sign language pose sequences from text inputs.
Our solution achieves a BLEU-1 score of 20.17, placing second in the challenge.
arXiv Detail & Related papers (2025-03-20T07:45:27Z) - A Framework for Bidirectional Decoding: Case Study in Morphological
Inflection [4.602447284133507]
We propose a framework for decoding sequences from the "outside-in"
At each step, the model chooses to generate a token on the left, on the right, or join the left and right sequences.
Our model sets state-of-the-art (SOTA) on the 2022 and 2023 shared tasks, beating the next best systems by over 4.7 and 2.7 points in average accuracy respectively.
arXiv Detail & Related papers (2023-05-21T22:08:31Z) - Hierarchical Phrase-based Sequence-to-Sequence Learning [94.10257313923478]
We describe a neural transducer that maintains the flexibility of standard sequence-to-sequence (seq2seq) models while incorporating hierarchical phrases as a source of inductive bias during training and as explicit constraints during inference.
Our approach trains two models: a discriminative derivation based on a bracketing grammar whose tree hierarchically aligns source and target phrases, and a neural seq2seq model that learns to translate the aligned phrases one-by-one.
arXiv Detail & Related papers (2022-11-15T05:22:40Z) - G2P-DDM: Generating Sign Pose Sequence from Gloss Sequence with Discrete
Diffusion Model [8.047896755805981]
The Sign Language Production project aims to automatically translate spoken languages into sign sequences.
We present a novel solution by converting the continuous pose space generation problem into a discrete sequence generation problem.
Our results show that our model outperforms state-of-the-art G2P models on the public SLP evaluation benchmark.
arXiv Detail & Related papers (2022-08-19T03:49:13Z) - Modeling Target-Side Morphology in Neural Machine Translation: A
Comparison of Strategies [72.56158036639707]
Morphologically rich languages pose difficulties to machine translation.
A large amount of differently inflected word surface forms entails a larger vocabulary.
Some inflected forms of infrequent terms typically do not appear in the training corpus.
Linguistic agreement requires the system to correctly match the grammatical categories between inflected word forms in the output sentence.
arXiv Detail & Related papers (2022-03-25T10:13:20Z) - Unsupervised Motion Representation Learning with Capsule Autoencoders [54.81628825371412]
Motion Capsule Autoencoder (MCAE) models motion in a two-level hierarchy.
MCAE is evaluated on a novel Trajectory20 motion dataset and various real-world skeleton-based human action datasets.
arXiv Detail & Related papers (2021-10-01T16:52:03Z) - Inducing Transformer's Compositional Generalization Ability via
Auxiliary Sequence Prediction Tasks [86.10875837475783]
Systematic compositionality is an essential mechanism in human language, allowing the recombination of known parts to create novel expressions.
Existing neural models have been shown to lack this basic ability in learning symbolic structures.
We propose two auxiliary sequence prediction tasks that track the progress of function and argument semantics.
arXiv Detail & Related papers (2021-09-30T16:41:19Z) - The SIGMORPHON 2020 Shared Task on Unsupervised Morphological Paradigm
Completion [28.728844366333185]
In this paper, we describe the findings of the SIGMORPHON 2020 shared task on unsupervised morphological paradigm completion.
Participants were asked to submit systems which take raw text and a list of lemmas as input, and output all inflected forms.
We present an analysis here, so that this shared task will ground further research on the topic.
arXiv Detail & Related papers (2020-05-28T03:09:58Z) - The IMS-CUBoulder System for the SIGMORPHON 2020 Shared Task on
Unsupervised Morphological Paradigm Completion [27.37360427124081]
We present the systems of the University of Stuttgart IMS and the University of Colorado Boulder for SIGMORPHON 2020 Task 2 on unsupervised morphological paradigm completion.
The task consists of generating the morphological paradigms of a set of lemmas, given only the lemmas themselves and unlabeled text.
Our pointer-generator system obtains the best score of all seven submitted systems on average over all languages, and outperforms the official baseline, which was best overall, on Bulgarian and Kannada.
arXiv Detail & Related papers (2020-05-25T21:23:52Z) - Towards Reasonably-Sized Character-Level Transformer NMT by Finetuning
Subword Systems [78.80826533405019]
We show that we can obtain a neural machine translation model that works at the character level without requiring token segmentation.
Our study is a significant step towards high-performance and easy to train character-based models that are not extremely large.
arXiv Detail & Related papers (2020-04-29T15:56:02Z) - A Simple Joint Model for Improved Contextual Neural Lemmatization [60.802451210656805]
We present a simple joint neural model for lemmatization and morphological tagging that achieves state-of-the-art results on 20 languages.
Our paper describes the model in addition to training and decoding procedures.
arXiv Detail & Related papers (2019-04-04T02:03:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.