Word Segmentation and Morphological Parsing for Sanskrit
- URL: http://arxiv.org/abs/2201.12833v1
- Date: Sun, 30 Jan 2022 14:37:00 GMT
- Title: Word Segmentation and Morphological Parsing for Sanskrit
- Authors: Jingwen Li, Leander Girrbach
- Abstract summary: We describe our participation in the Word and Morphological Parsing (WSMP) hackathon for Sanskrit.
We approach the word segmentation task as a sequence labelling task by predicting edit operations from which segmentations are derived.
We approach the morphological analysis task by predicting morphological tags and rules that transform inflected words into their corresponding stems.
- Score: 1.2929576948110548
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We describe our participation in the Word Segmentation and Morphological
Parsing (WSMP) for Sanskrit hackathon. We approach the word segmentation task
as a sequence labelling task by predicting edit operations from which
segmentations are derived. We approach the morphological analysis task by
predicting morphological tags and rules that transform inflected words into
their corresponding stems. Also, we propose an end-to-end trainable pipeline
model for joint segmentation and morphological analysis. Our model performed
best in the joint segmentation and analysis subtask (80.018 F1 score) and
performed second best in the individual subtasks (segmentation: 96.189 F1 score
/ analysis: 69.180 F1 score).
Finally, we analyse errors made by our models and suggest future work and
possible improvements regarding data and evaluation.
Related papers
- Lexically Grounded Subword Segmentation [0.0]
We present three innovations in tokenization and subword segmentation.
First, we propose to use unsupervised morphological analysis with Morfessor as pre-tokenization.
Second, we present an method for obtaining subword embeddings grounded in a word embedding space.
Third, we introduce an efficient segmentation algorithm based on a subword bigram model.
arXiv Detail & Related papers (2024-06-19T13:48:19Z) - Labeled Morphological Segmentation with Semi-Markov Models [127.69031138022534]
We present labeled morphological segmentation, an alternative view of morphological processing that unifies several tasks.
We additionally introduce a new hierarchy of morphotactic tagsets.
We develop modelname, a discriminative morphological segmentation system that explicitly models morphotactics.
arXiv Detail & Related papers (2024-04-13T12:51:53Z) - OMG-Seg: Is One Model Good Enough For All Segmentation? [83.17068644513144]
OMG-Seg is a transformer-based encoder-decoder architecture with task-specific queries and outputs.
We show that OMG-Seg can support over ten distinct segmentation tasks and yet significantly reduce computational and parameter overhead.
arXiv Detail & Related papers (2024-01-18T18:59:34Z) - Where's the Point? Self-Supervised Multilingual Punctuation-Agnostic
Sentence Segmentation [65.6736056006381]
We present a multilingual punctuation-agnostic sentence segmentation method covering 85 languages.
Our method outperforms all the prior best sentence-segmentation tools by an average of 6.1% F1 points.
By using our method to match sentence segmentation to the segmentation used during training of MT models, we achieve an average improvement of 2.3 BLEU points.
arXiv Detail & Related papers (2023-05-30T09:49:42Z) - Subword Segmental Machine Translation: Unifying Segmentation and Target
Sentence Generation [7.252933737829635]
Subword segmental machine translation (SSMT) learns to segment target sentence words while jointly learning to generate target sentences.
Experiments across 6 translation directions show that SSMT improves chrF scores for morphologically rich agglutinative languages.
arXiv Detail & Related papers (2023-05-11T17:44:29Z) - Ensembling Instance and Semantic Segmentation for Panoptic Segmentation [0.0]
Methods first performs instance segmentation and semantic segmentation separately, then combines the two to generate panoptic segmentation results.
We add several expert models of Mask R-CNN in instance segmentation to tackle the data imbalance problem in the training data.
In semantic segmentation, we trained several models with various backbones and use an ensemble strategy which further boosts the segmentation results.
arXiv Detail & Related papers (2023-04-20T14:02:01Z) - Exploring the State-of-the-Art Language Modeling Methods and Data
Augmentation Techniques for Multilingual Clause-Level Morphology [3.8498574327875947]
We present our work on all three parts of the shared task: inflection, reinflection, and analysis.
We mainly explore two approaches: Transformer models in combination with data augmentation, and exploiting the state-of-the-art language modeling techniques for morphological analysis.
Our methods achieved first place in each of the three tasks and outperforms mT5-baseline with 89% for inflection, 80% for reinflection and 12% for analysis.
arXiv Detail & Related papers (2022-11-03T11:53:39Z) - Influence Functions for Sequence Tagging Models [49.81774968547377]
We extend influence functions to trace predictions back to the training points that informed them.
We show the practical utility of segment influence by using the method to identify systematic annotation errors.
arXiv Detail & Related papers (2022-10-25T17:13:11Z) - Did the Cat Drink the Coffee? Challenging Transformers with Generalized
Event Knowledge [59.22170796793179]
Transformers Language Models (TLMs) were tested on a benchmark for the textitdynamic estimation of thematic fit
Our results show that TLMs can reach performances that are comparable to those achieved by SDM.
However, additional analysis consistently suggests that TLMs do not capture important aspects of event knowledge.
arXiv Detail & Related papers (2021-07-22T20:52:26Z) - A Differentiable Relaxation of Graph Segmentation and Alignment for AMR
Parsing [75.36126971685034]
We treat alignment and segmentation as latent variables in our model and induce them as part of end-to-end training.
Our method also approaches that of a model that relies on citetLyu2018AMRPA's segmentation rules, which were hand-crafted to handle individual AMR constructions.
arXiv Detail & Related papers (2020-10-23T21:22:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.