Labeled Morphological Segmentation with Semi-Markov Models
- URL: http://arxiv.org/abs/2404.08997v1
- Date: Sat, 13 Apr 2024 12:51:53 GMT
- Title: Labeled Morphological Segmentation with Semi-Markov Models
- Authors: Ryan Cotterell, Thomas Müller, Alexander Fraser, Hinrich Schütze,
- Abstract summary: We present labeled morphological segmentation, an alternative view of morphological processing that unifies several tasks.
We additionally introduce a new hierarchy of morphotactic tagsets.
We develop modelname, a discriminative morphological segmentation system that explicitly models morphotactics.
- Score: 127.69031138022534
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present labeled morphological segmentation, an alternative view of morphological processing that unifies several tasks. From an annotation standpoint, we additionally introduce a new hierarchy of morphotactic tagsets. Finally, we develop \modelname, a discriminative morphological segmentation system that, contrary to previous work, explicitly models morphotactics. We show that \textsc{chipmunk} yields improved performance on three tasks for all six languages: (i) morphological segmentation, (ii) stemming and (iii) morphological tag classification. On morphological segmentation, our method shows absolute improvements of 2--6 points $F_1$ over the baseline.
Related papers
- Lexically Grounded Subword Segmentation [0.0]
We present three innovations in tokenization and subword segmentation.
First, we propose to use unsupervised morphological analysis with Morfessor as pre-tokenization.
Second, we present an method for obtaining subword embeddings grounded in a word embedding space.
Third, we introduce an efficient segmentation algorithm based on a subword bigram model.
arXiv Detail & Related papers (2024-06-19T13:48:19Z) - Exploring Linguistic Probes for Morphological Generalization [11.568042812213712]
Testing these probes on three morphologically distinct languages, we find evidence that three leading morphological inflection systems employ distinct generalization strategies over conjugational classes and feature sets on both orthographic and phonologically transcribed inputs.
arXiv Detail & Related papers (2023-10-20T17:45:30Z) - Morphological Inflection with Phonological Features [7.245355976804435]
This work explores effects on performance obtained through various ways in which morphological models get access to subcharacter phonological features.
We elicit phonemic data from standard graphemic data using language-specific grammars for languages with shallow grapheme-to-phoneme mapping.
arXiv Detail & Related papers (2023-06-21T21:34:39Z) - Part-guided Relational Transformers for Fine-grained Visual Recognition [59.20531172172135]
We propose a framework to learn the discriminative part features and explore correlations with a feature transformation module.
Our proposed approach does not rely on additional part branches and reaches state-the-of-art performance on 3-of-the-level object recognition.
arXiv Detail & Related papers (2022-12-28T03:45:56Z) - SemAffiNet: Semantic-Affine Transformation for Point Cloud Segmentation [94.11915008006483]
We propose SemAffiNet for point cloud semantic segmentation.
We conduct extensive experiments on the ScanNetV2 and NYUv2 datasets.
arXiv Detail & Related papers (2022-05-26T17:00:23Z) - Graph Adaptive Semantic Transfer for Cross-domain Sentiment
Classification [68.06496970320595]
Cross-domain sentiment classification (CDSC) aims to use the transferable semantics learned from the source domain to predict the sentiment of reviews in the unlabeled target domain.
We present Graph Adaptive Semantic Transfer (GAST) model, an adaptive syntactic graph embedding method that is able to learn domain-invariant semantics from both word sequences and syntactic graphs.
arXiv Detail & Related papers (2022-05-18T07:47:01Z) - UniMorph 4.0: Universal Morphology [104.69846084893298]
This paper presents the expansions and improvements made on several fronts over the last couple of years.
Collaborative efforts by numerous linguists have added 67 new languages, including 30 endangered languages.
In light of the last UniMorph release, we also augmented the database with morpheme segmentation for 16 languages.
arXiv Detail & Related papers (2022-05-07T09:19:02Z) - Word Segmentation and Morphological Parsing for Sanskrit [1.2929576948110548]
We describe our participation in the Word and Morphological Parsing (WSMP) hackathon for Sanskrit.
We approach the word segmentation task as a sequence labelling task by predicting edit operations from which segmentations are derived.
We approach the morphological analysis task by predicting morphological tags and rules that transform inflected words into their corresponding stems.
arXiv Detail & Related papers (2022-01-30T14:37:00Z) - A Systematic Analysis of Morphological Content in BERT Models for
Multiple Languages [2.345305607613153]
This work describes experiments which probe the hidden representations of several BERT-style models for morphological content.
The goal is to examine the extent to which discrete linguistic structure, in the form of morphological features and feature values, presents itself in the vector representations and attention distributions of pre-trained language models for five European languages.
arXiv Detail & Related papers (2020-04-06T22:50:27Z) - A Simple Joint Model for Improved Contextual Neural Lemmatization [60.802451210656805]
We present a simple joint neural model for lemmatization and morphological tagging that achieves state-of-the-art results on 20 languages.
Our paper describes the model in addition to training and decoding procedures.
arXiv Detail & Related papers (2019-04-04T02:03:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.