NU HLT at CMCL 2022 Shared Task: Multilingual and Crosslingual
Prediction of Human Reading Behavior in Universal Language Space
- URL: http://arxiv.org/abs/2202.10855v1
- Date: Tue, 22 Feb 2022 12:39:16 GMT
- Title: NU HLT at CMCL 2022 Shared Task: Multilingual and Crosslingual
Prediction of Human Reading Behavior in Universal Language Space
- Authors: Joseph Marvin Imperial
- Abstract summary: The secret behind the success of this model is in the preprocessing step where all words are transformed to their universal language representation via the International Phonetic Alphabet (IPA)
A finetuned Random Forest model obtained best performance for both tasks with 3.8031 and 3.9065 MAE scores for mean first fixation duration (FFDAve) and mean total reading time (TRTAve) respectively.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we present a unified model that works for both multilingual
and crosslingual prediction of reading times of words in various languages. The
secret behind the success of this model is in the preprocessing step where all
words are transformed to their universal language representation via the
International Phonetic Alphabet (IPA). To the best of our knowledge, this is
the first study to favorable exploit this phonological property of language for
the two tasks. Various feature types were extracted covering basic frequencies,
n-grams, information theoretic, and psycholinguistically-motivated predictors
for model training. A finetuned Random Forest model obtained best performance
for both tasks with 3.8031 and 3.9065 MAE scores for mean first fixation
duration (FFDAve) and mean total reading time (TRTAve) respectively.
Related papers
- Multilingual Turn-taking Prediction Using Voice Activity Projection [25.094622033971643]
This paper investigates the application of voice activity projection (VAP), a predictive turn-taking model for spoken dialogue, on multilingual data.
The results show that a monolingual VAP model trained on one language does not make good predictions when applied to other languages.
A multilingual model, trained on all three languages, demonstrates predictive performance on par with monolingual models across all languages.
arXiv Detail & Related papers (2024-03-11T07:50:29Z) - Learning Cross-lingual Visual Speech Representations [108.68531445641769]
Cross-lingual self-supervised visual representation learning has been a growing research topic in the last few years.
We use the recently-proposed Raw Audio-Visual Speechs (RAVEn) framework to pre-train an audio-visual model with unlabelled data.
Our experiments show that: (1) multi-lingual models with more data outperform monolingual ones, but, when keeping the amount of data fixed, monolingual models tend to reach better performance.
arXiv Detail & Related papers (2023-03-14T17:05:08Z) - M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for
Multilingual Speech to Image Retrieval [56.49878599920353]
This work investigates the use of large-scale, English-only pre-trained models (CLIP and HuBERT) for multilingual image-speech retrieval.
For non-English image-speech retrieval, we outperform the current state-of-the-art performance by a wide margin both when training separate models for each language, and with a single model which processes speech in all three languages.
arXiv Detail & Related papers (2022-11-02T14:54:45Z) - Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of
Multilingual Language Models [73.11488464916668]
This study investigates the dynamics of the multilingual pretraining process.
We probe checkpoints taken from throughout XLM-R pretraining, using a suite of linguistic tasks.
Our analysis shows that the model achieves high in-language performance early on, with lower-level linguistic skills acquired before more complex ones.
arXiv Detail & Related papers (2022-05-24T03:35:00Z) - Exploring Teacher-Student Learning Approach for Multi-lingual
Speech-to-Intent Classification [73.5497360800395]
We develop an end-to-end system that supports multiple languages.
We exploit knowledge from a pre-trained multi-lingual natural language processing model.
arXiv Detail & Related papers (2021-09-28T04:43:11Z) - Specializing Multilingual Language Models: An Empirical Study [50.7526245872855]
Contextualized word representations from pretrained multilingual language models have become the de facto standard for addressing natural language tasks.
For languages rarely or never seen by these models, directly using such models often results in suboptimal representation or use of data.
arXiv Detail & Related papers (2021-06-16T18:13:55Z) - NEMO: Frequentist Inference Approach to Constrained Linguistic Typology
Feature Prediction in SIGTYP 2020 Shared Task [83.43738174234053]
We employ frequentist inference to represent correlations between typological features and use this representation to train simple multi-class estimators that predict individual features.
Our best configuration achieved the micro-averaged accuracy score of 0.66 on 149 test languages.
arXiv Detail & Related papers (2020-10-12T19:25:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.