A frame semantics based approach to comparative study of digitized
corpus
- URL: http://arxiv.org/abs/2006.00113v1
- Date: Fri, 29 May 2020 22:56:25 GMT
- Title: A frame semantics based approach to comparative study of digitized
corpus
- Authors: Abdelaziz Lakhfif and Mohamed Tayeb Laskri
- Abstract summary: The paper focuses on the morphologic, syntactic, and semantic annotation process of English-Arabic aligned corpus created from a digitized novels.
The present study argues that differences in motion events conceptualization across languages can be described with frame structure and frame-to-frame relations.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: in this paper, we present a corpus linguistics based approach applied to
analyzing digitized classical multilingual novels and narrative texts, from a
semantic point of view. Digitized novels such as "the hobbit (Tolkien J. R. R.,
1937)" and "the hound of the Baskervilles (Doyle A. C. 1901-1902)", which were
widely translated to dozens of languages, provide rich materials for analyzing
languages differences from several perspectives and within a number of
disciplines like linguistics, philosophy and cognitive science. Taking motion
events conceptualization as a case study, this paper, focus on the morphologic,
syntactic, and semantic annotation process of English-Arabic aligned corpus
created from a digitized novels, in order to re-examine the linguistic
encodings of motion events in English and Arabic in terms of Frame Semantics.
The present study argues that differences in motion events conceptualization
across languages can be described with frame structure and frame-to-frame
relations.
Related papers
- Understanding Cross-Lingual Alignment -- A Survey [52.572071017877704]
Cross-lingual alignment is the meaningful similarity of representations across languages in multilingual language models.
We survey the literature of techniques to improve cross-lingual alignment, providing a taxonomy of methods and summarising insights from throughout the field.
arXiv Detail & Related papers (2024-04-09T11:39:53Z) - Finding Pragmatic Differences Between Disciplines [14.587150614245123]
We learn a fixed set of domain-agnostic descriptors for document sections and "retrofit" the corpus to these descriptors.
We analyze the position and ordering of these descriptors across documents to understand the relationship between discipline and structure.
Our findings lay the foundation for future work in assessing research quality, domain style transfer, and further pragmatic analysis.
arXiv Detail & Related papers (2023-09-30T00:46:14Z) - A Corpus for Sentence-level Subjectivity Detection on English News Articles [49.49218203204942]
We use our guidelines to collect NewsSD-ENG, a corpus of 638 objective and 411 subjective sentences extracted from English news articles on controversial topics.
Our corpus paves the way for subjectivity detection in English without relying on language-specific tools, such as lexicons or machine translation.
arXiv Detail & Related papers (2023-05-29T11:54:50Z) - Zero-shot Cross-Linguistic Learning of Event Semantics [27.997873309702225]
We look at captions of images across Arabic, Chinese, Farsi, German, Russian, and Turkish.
We show that lexical aspects can be predicted for a given language despite not having observed any annotated data for this language at all.
arXiv Detail & Related papers (2022-07-05T23:18:36Z) - The Open corpus of the Veps and Karelian languages: overview and
applications [52.77024349608834]
The Open Corpus of the Veps and Karelian Languages (VepKar) is an extension of the Veps created in 2009.
The VepKar corpus comprises texts in Karelian and Veps, multifunctional dictionaries linked to them, and software with an advanced system of search.
Future plans include developing a speech module for working with audio recordings and a syntactic tagging module using morphological analysis outputs.
arXiv Detail & Related papers (2022-06-08T13:05:50Z) - A Novel Corpus of Discourse Structure in Humans and Computers [55.74664144248097]
We present a novel corpus of 445 human- and computer-generated documents, comprising about 27,000 clauses.
The corpus covers both formal and informal discourse, and contains documents generated using fine-tuned GPT-2.
arXiv Detail & Related papers (2021-11-10T20:56:08Z) - A Massively Multilingual Analysis of Cross-linguality in Shared
Embedding Space [61.18554842370824]
In cross-lingual language models, representations for many different languages live in the same space.
We compute a task-based measure of cross-lingual alignment in the form of bitext retrieval performance.
We examine a range of linguistic, quasi-linguistic, and training-related features as potential predictors of these alignment metrics.
arXiv Detail & Related papers (2021-09-13T21:05:37Z) - The interplay between morphological typology and script on a novel
multi-layer Algerian dialect corpus [4.974890682815778]
We introduce a newly annotated corpus of Algerian user-generated comments comprising parallel annotations of Algerian written in Latin, Arabic, and code-switched scripts.
We find there is a delicate relationship between script and typology for part-of-speech, while sentiment analysis is less sensitive.
arXiv Detail & Related papers (2021-05-16T10:22:21Z) - Character Entropy in Modern and Historical Texts: Comparison Metrics for
an Undeciphered Manuscript [0.4061135251278187]
This paper outlines the creation of three corpora for multilingual comparison and analysis of the Voynich manuscript.
A corpus of Voynich texts partitioned by Currier language, scribal hand, and transcription system, a corpus of 294 language samples compiled from Wikipedia, and a corpus of eighteen transcribed historical texts in eight languages.
arXiv Detail & Related papers (2020-10-28T01:53:59Z) - Fine-Grained Analysis of Cross-Linguistic Syntactic Divergences [18.19093600136057]
We propose a framework for extracting divergence patterns for any language pair from a parallel corpus.
We show that our framework provides a detailed picture of cross-language divergences, generalizes previous approaches, and lends itself to full automation.
arXiv Detail & Related papers (2020-05-07T13:05:03Z) - Evaluating Transformer-Based Multilingual Text Classification [55.53547556060537]
We argue that NLP tools perform unequally across languages with different syntactic and morphological structures.
We calculate word order and morphological similarity indices to aid our empirical study.
arXiv Detail & Related papers (2020-04-29T03:34:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.