Binary and Multitask Classification Model for Dutch Anaphora Resolution:
Die/Dat Prediction
- URL: http://arxiv.org/abs/2001.02943v2
- Date: Fri, 11 Sep 2020 14:17:59 GMT
- Title: Binary and Multitask Classification Model for Dutch Anaphora Resolution:
Die/Dat Prediction
- Authors: Liesbeth Allein, Artuur Leeuwenberg and Marie-Francine Moens
- Abstract summary: correct use of Dutch pronouns 'die' and 'dat' is a stumbling block for both native and non-native speakers of Dutch.
This study constructs the first neural network model for Dutch demonstrative and relative pronoun resolution.
- Score: 18.309099448064273
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The correct use of Dutch pronouns 'die' and 'dat' is a stumbling block for
both native and non-native speakers of Dutch due to the multiplicity of
syntactic functions and the dependency on the antecedent's gender and number.
Drawing on previous research conducted on neural context-dependent dt-mistake
correction models (Heyman et al. 2018), this study constructs the first neural
network model for Dutch demonstrative and relative pronoun resolution that
specifically focuses on the correction and part-of-speech prediction of these
two pronouns. Two separate datasets are built with sentences obtained from,
respectively, the Dutch Europarl corpus (Koehn 2015) - which contains the
proceedings of the European Parliament from 1996 to the present - and the SoNaR
corpus (Oostdijk et al. 2013) - which contains Dutch texts from a variety of
domains such as newspapers, blogs and legal texts. Firstly, a binary
classification model solely predicts the correct 'die' or 'dat'. The classifier
with a bidirectional long short-term memory architecture achieves 84.56%
accuracy. Secondly, a multitask classification model simultaneously predicts
the correct 'die' or 'dat' and its part-of-speech tag. The model containing a
combination of a sentence and context encoder with both a bidirectional long
short-term memory architecture results in 88.63% accuracy for die/dat
prediction and 87.73% accuracy for part-of-speech prediction. More
evenly-balanced data, larger word embeddings, an extra bidirectional long
short-term memory layer and integrated part-of-speech knowledge positively
affects die/dat prediction performance, while a context encoder architecture
raises part-of-speech prediction performance. This study shows promising
results and can serve as a starting point for future research on machine
learning models for Dutch anaphora resolution.
Related papers
- How Language Models Prioritize Contextual Grammatical Cues? [3.9790222241649587]
We investigate how language models handle gender agreement when multiple gender cue words are present.
Our findings reveal striking differences in how encoder-based and decoder-based models prioritize and use contextual information for their predictions.
arXiv Detail & Related papers (2024-10-04T14:09:05Z) - End-to-End Lip Reading in Romanian with Cross-Lingual Domain Adaptation
and Lateral Inhibition [2.839471733237535]
We analyze several architectures and optimizations on the underrepresented, short-scale Romanian language dataset called Wild LRRo.
We obtain state-of-the-art results using our proposed method, namely cross-lingual domain adaptation and unlabeled videos.
We also assess the performance of adding a layer inspired by the neural inhibition mechanism.
arXiv Detail & Related papers (2023-10-07T15:36:58Z) - Analyzing Vietnamese Legal Questions Using Deep Neural Networks with
Biaffine Classifiers [3.116035935327534]
We propose using deep neural networks to extract important information from Vietnamese legal questions.
Given a legal question in natural language, the goal is to extract all the segments that contain the needed information to answer the question.
arXiv Detail & Related papers (2023-04-27T18:19:24Z) - Ensemble Transfer Learning for Multilingual Coreference Resolution [60.409789753164944]
A problem that frequently occurs when working with a non-English language is the scarcity of annotated training data.
We design a simple but effective ensemble-based framework that combines various transfer learning techniques.
We also propose a low-cost TL method that bootstraps coreference resolution models by utilizing Wikipedia anchor texts.
arXiv Detail & Related papers (2023-01-22T18:22:55Z) - LongFNT: Long-form Speech Recognition with Factorized Neural Transducer [64.75547712366784]
We propose the LongFNT-Text architecture, which fuses the sentence-level long-form features directly with the output of the vocabulary predictor.
The effectiveness of our LongFNT approach is validated on LibriSpeech and GigaSpeech corpora with 19% and 12% relative word error rate(WER) reduction, respectively.
arXiv Detail & Related papers (2022-11-17T08:48:27Z) - Speaker Embedding-aware Neural Diarization: a Novel Framework for
Overlapped Speech Diarization in the Meeting Scenario [51.5031673695118]
We reformulate overlapped speech diarization as a single-label prediction problem.
We propose the speaker embedding-aware neural diarization (SEND) system.
arXiv Detail & Related papers (2022-03-18T06:40:39Z) - From Good to Best: Two-Stage Training for Cross-lingual Machine Reading
Comprehension [51.953428342923885]
We develop a two-stage approach to enhance the model performance.
The first stage targets at recall: we design a hard-learning (HL) algorithm to maximize the likelihood that the top-k predictions contain the accurate answer.
The second stage focuses on precision: an answer-aware contrastive learning mechanism is developed to learn the fine difference between the accurate answer and other candidates.
arXiv Detail & Related papers (2021-12-09T07:31:15Z) - NLP-CIC @ DIACR-Ita: POS and Neighbor Based Distributional Models for
Lexical Semantic Change in Diachronic Italian Corpora [62.997667081978825]
We present our systems and findings on unsupervised lexical semantic change for the Italian language.
The task is to determine whether a target word has evolved its meaning with time, only relying on raw-text from two time-specific datasets.
We propose two models representing the target words across the periods to predict the changing words using threshold and voting schemes.
arXiv Detail & Related papers (2020-11-07T11:27:18Z) - Combining Deep Learning and String Kernels for the Localization of Swiss
German Tweets [28.497747521078647]
We address the second subtask, which targets a data set composed of nearly 30 thousand Swiss German Jodels.
We frame the task as a double regression problem, employing a variety of machine learning approaches to predict both latitude and longitude.
Our empirical results indicate that the handcrafted model based on string kernels outperforms the deep learning approaches.
arXiv Detail & Related papers (2020-10-07T19:16:45Z) - Analysis of Predictive Coding Models for Phonemic Representation
Learning in Small Datasets [0.0]
The present study investigates the behaviour of two predictive coding models, Autoregressive Predictive Coding and Contrastive Predictive Coding, in a phoneme discrimination task.
Our experiments show a strong correlation between the autoregressive loss and the phoneme discrimination scores with the two datasets.
The CPC model shows rapid convergence already after one pass over the training data, and, on average, its representations outperform those of APC on both languages.
arXiv Detail & Related papers (2020-07-08T15:46:13Z) - Parameter Space Factorization for Zero-Shot Learning across Tasks and
Languages [112.65994041398481]
We propose a Bayesian generative model for the space of neural parameters.
We infer the posteriors over such latent variables based on data from seen task-language combinations.
Our model yields comparable or better results than state-of-the-art, zero-shot cross-lingual transfer methods.
arXiv Detail & Related papers (2020-01-30T16:58:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.