Predicting Typological Features in WALS using Language Embeddings and
Conditional Probabilities: \'UFAL Submission to the SIGTYP 2020 Shared Task
- URL: http://arxiv.org/abs/2010.03920v1
- Date: Thu, 8 Oct 2020 12:05:48 GMT
- Title: Predicting Typological Features in WALS using Language Embeddings and
Conditional Probabilities: \'UFAL Submission to the SIGTYP 2020 Shared Task
- Authors: Martin Vastl, Daniel Zeman, Rudolf Rosa
- Abstract summary: We submit a constrained system, predicting typological features only based on the WALS database.
We reach the accuracy of 70.7% on the test data and rank first in the shared task.
- Score: 1.4848029858256582
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We present our submission to the SIGTYP 2020 Shared Task on the prediction of
typological features. We submit a constrained system, predicting typological
features only based on the WALS database. We investigate two approaches. The
simpler of the two is a system based on estimating correlation of feature
values within languages by computing conditional probabilities and mutual
information. The second approach is to train a neural predictor operating on
precomputed language embeddings based on WALS features. Our submitted system
combines the two approaches based on their self-estimated confidence scores. We
reach the accuracy of 70.7% on the test data and rank first in the shared task.
Related papers
- data2lang2vec: Data Driven Typological Features Completion [8.28573483085828]
We introduce a multi-lingual Part-of-Speech (POS) tagger, achieving over 70% accuracy across 1,749 languages.
We also introduce a more realistic evaluation setup, focusing on likely to be missing typology features.
arXiv Detail & Related papers (2024-09-25T21:32:57Z) - Link Prediction for Wikipedia Articles as a Natural Language Inference
Task [1.1842520528140819]
This paper introduces an approach to link prediction in Wikipedia articles by formulating it as a natural language inference (NLI) task.
We implement our system based on the Sentence Pair Classification for Link Prediction for the Wikipedia Articles task.
Our system achieved 0.99996 Macro F1-score and 1.00000 Macro F1-score for the public and private test sets, respectively.
arXiv Detail & Related papers (2023-08-31T05:25:04Z) - Parallel Reasoning Network for Human-Object Interaction Detection [53.422076419484945]
We propose a new transformer-based method named Parallel Reasoning Network(PR-Net)
PR-Net constructs two independent predictors for instance-level localization and relation-level understanding.
Our PR-Net has achieved competitive results on HICO-DET and V-COCO benchmarks.
arXiv Detail & Related papers (2023-01-09T17:00:34Z) - Hybrid Rule-Neural Coreference Resolution System based on Actor-Critic
Learning [53.73316523766183]
Coreference resolution systems need to tackle two main tasks.
One task is to detect all of the potential mentions, the other is to learn the linking of an antecedent for each possible mention.
We propose a hybrid rule-neural coreference resolution system based on actor-critic learning.
arXiv Detail & Related papers (2022-12-20T08:55:47Z) - From Good to Best: Two-Stage Training for Cross-lingual Machine Reading
Comprehension [51.953428342923885]
We develop a two-stage approach to enhance the model performance.
The first stage targets at recall: we design a hard-learning (HL) algorithm to maximize the likelihood that the top-k predictions contain the accurate answer.
The second stage focuses on precision: an answer-aware contrastive learning mechanism is developed to learn the fine difference between the accurate answer and other candidates.
arXiv Detail & Related papers (2021-12-09T07:31:15Z) - C-Learning: Learning to Achieve Goals via Recursive Classification [163.7610618571879]
We study the problem of predicting and controlling the future state distribution of an autonomous agent.
Our work lays a principled foundation for goal-conditioned RL as density estimation.
arXiv Detail & Related papers (2020-11-17T19:58:56Z) - SIGTYP 2020 Shared Task: Prediction of Typological Features [78.95376120154083]
A major drawback hampering broader adoption of typological KBs is that they are sparsely populated.
As typological features often correlate with one another, it is possible to predict them and thus automatically populate typological KBs.
Overall, the task attracted 8 submissions from 5 teams, out of which the most successful methods make use of such feature correlations.
arXiv Detail & Related papers (2020-10-16T08:47:24Z) - NEMO: Frequentist Inference Approach to Constrained Linguistic Typology
Feature Prediction in SIGTYP 2020 Shared Task [83.43738174234053]
We employ frequentist inference to represent correlations between typological features and use this representation to train simple multi-class estimators that predict individual features.
Our best configuration achieved the micro-averaged accuracy score of 0.66 on 149 test languages.
arXiv Detail & Related papers (2020-10-12T19:25:43Z) - Systematic Generalization on gSCAN with Language Conditioned Embedding [19.39687991647301]
Systematic Generalization refers to a learning algorithm's ability to extrapolate learned behavior to unseen situations.
We propose a novel method that learns objects' contextualized embeddings with dynamic message passing conditioned on the input natural language.
arXiv Detail & Related papers (2020-09-11T17:35:05Z) - Yseop at SemEval-2020 Task 5: Cascaded BERT Language Model for
Counterfactual Statement Analysis [0.0]
We use a BERT base model for the classification task and build a hybrid BERT Multi-Layer Perceptron system to handle the sequence identification task.
Our experiments show that while introducing syntactic and semantic features does little in improving the system in the classification task, using these types of features as cascaded linear inputs to fine-tune the sequence-delimiting ability of the model ensures it outperforms other similar-purpose complex systems like BiLSTM-CRF in the second task.
arXiv Detail & Related papers (2020-05-18T08:19:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.