NEMO: Frequentist Inference Approach to Constrained Linguistic Typology
Feature Prediction in SIGTYP 2020 Shared Task
- URL: http://arxiv.org/abs/2010.05985v1
- Date: Mon, 12 Oct 2020 19:25:43 GMT
- Title: NEMO: Frequentist Inference Approach to Constrained Linguistic Typology
Feature Prediction in SIGTYP 2020 Shared Task
- Authors: Alexander Gutkin and Richard Sproat
- Abstract summary: We employ frequentist inference to represent correlations between typological features and use this representation to train simple multi-class estimators that predict individual features.
Our best configuration achieved the micro-averaged accuracy score of 0.66 on 149 test languages.
- Score: 83.43738174234053
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This paper describes the NEMO submission to SIGTYP 2020 shared task which
deals with prediction of linguistic typological features for multiple languages
using the data derived from World Atlas of Language Structures (WALS). We
employ frequentist inference to represent correlations between typological
features and use this representation to train simple multi-class estimators
that predict individual features. We describe two submitted ridge
regression-based configurations which ranked second and third overall in the
constrained task. Our best configuration achieved the micro-averaged accuracy
score of 0.66 on 149 test languages.
Related papers
- Investigating Multilingual Coreference Resolution by Universal
Annotations [11.035051211351213]
We study coreference by examining the ground truth data at different linguistic levels.
We perform an error analysis of the most challenging cases that the SotA system fails to resolve.
We extract features from universal morphosyntactic annotations and integrate these features into a baseline system to assess their potential benefits.
arXiv Detail & Related papers (2023-10-26T18:50:04Z) - NU HLT at CMCL 2022 Shared Task: Multilingual and Crosslingual
Prediction of Human Reading Behavior in Universal Language Space [0.0]
The secret behind the success of this model is in the preprocessing step where all words are transformed to their universal language representation via the International Phonetic Alphabet (IPA)
A finetuned Random Forest model obtained best performance for both tasks with 3.8031 and 3.9065 MAE scores for mean first fixation duration (FFDAve) and mean total reading time (TRTAve) respectively.
arXiv Detail & Related papers (2022-02-22T12:39:16Z) - CUGE: A Chinese Language Understanding and Generation Evaluation
Benchmark [144.05723617401674]
General-purpose language intelligence evaluation has been a longstanding goal for natural language processing.
We argue that for general-purpose language intelligence evaluation, the benchmark itself needs to be comprehensive and systematic.
We propose CUGE, a Chinese Language Understanding and Generation Evaluation benchmark with the following features.
arXiv Detail & Related papers (2021-12-27T11:08:58Z) - LCP-RIT at SemEval-2021 Task 1: Exploring Linguistic Features for
Lexical Complexity Prediction [4.86331990243181]
This paper describes team LCP-RIT's submission to the SemEval-2021 Task 1: Lexical Complexity Prediction (LCP)
Our system uses logistic regression and a wide range of linguistic features to predict the complexity of single words in this dataset.
We evaluate the results in terms of mean absolute error, mean squared error, Pearson correlation, and Spearman correlation.
arXiv Detail & Related papers (2021-05-18T18:55:04Z) - Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models.
We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z) - SIGTYP 2020 Shared Task: Prediction of Typological Features [78.95376120154083]
A major drawback hampering broader adoption of typological KBs is that they are sparsely populated.
As typological features often correlate with one another, it is possible to predict them and thus automatically populate typological KBs.
Overall, the task attracted 8 submissions from 5 teams, out of which the most successful methods make use of such feature correlations.
arXiv Detail & Related papers (2020-10-16T08:47:24Z) - Predicting Typological Features in WALS using Language Embeddings and
Conditional Probabilities: \'UFAL Submission to the SIGTYP 2020 Shared Task [1.4848029858256582]
We submit a constrained system, predicting typological features only based on the WALS database.
We reach the accuracy of 70.7% on the test data and rank first in the shared task.
arXiv Detail & Related papers (2020-10-08T12:05:48Z) - Cross-lingual Spoken Language Understanding with Regularized
Representation Alignment [71.53159402053392]
We propose a regularization approach to align word-level and sentence-level representations across languages without any external resource.
Experiments on the cross-lingual spoken language understanding task show that our model outperforms current state-of-the-art methods in both few-shot and zero-shot scenarios.
arXiv Detail & Related papers (2020-09-30T08:56:53Z) - Linguistic Typology Features from Text: Inferring the Sparse Features of
World Atlas of Language Structures [73.06435180872293]
We construct a recurrent neural network predictor based on byte embeddings and convolutional layers.
We show that some features from various linguistic types can be predicted reliably.
arXiv Detail & Related papers (2020-04-30T21:00:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.