Related papers: A frame semantics based approach to comparative study of digitized corpus

A frame semantics based approach to comparative study of digitized corpus

URL: http://arxiv.org/abs/2006.00113v1
Date: Fri, 29 May 2020 22:56:25 GMT
Title: A frame semantics based approach to comparative study of digitized corpus
Authors: Abdelaziz Lakhfif and Mohamed Tayeb Laskri
Abstract summary: The paper focuses on the morphologic, syntactic, and semantic annotation process of English-Arabic aligned corpus created from a digitized novels. The present study argues that differences in motion events conceptualization across languages can be described with frame structure and frame-to-frame relations.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: in this paper, we present a corpus linguistics based approach applied to analyzing digitized classical multilingual novels and narrative texts, from a semantic point of view. Digitized novels such as "the hobbit (Tolkien J. R. R., 1937)" and "the hound of the Baskervilles (Doyle A. C. 1901-1902)", which were widely translated to dozens of languages, provide rich materials for analyzing languages differences from several perspectives and within a number of disciplines like linguistics, philosophy and cognitive science. Taking motion events conceptualization as a case study, this paper, focus on the morphologic, syntactic, and semantic annotation process of English-Arabic aligned corpus created from a digitized novels, in order to re-examine the linguistic encodings of motion events in English and Arabic in terms of Frame Semantics. The present study argues that differences in motion events conceptualization across languages can be described with frame structure and frame-to-frame relations.

Related papers

Developing a Comprehensive Framework for Sentiment Analysis in Turkish [0.0]
This thesis can be considered the most detailed and comprehensive study made on sentiment analysis in Turkish as of July, 2020.<n>We developed a comprehensive framework for sentiment analysis that takes its many aspects into account mainly for Turkish.<n>We built novel word embeddings that exploit sentiment, syntactic, semantic, and lexical characteristics for both Turkish and English.
arXiv Detail & Related papers (2025-11-29T15:14:57Z)
NAZM: Network Analysis of Zonal Metrics in Persian Poetic Tradition [0.0]
This study formalizes a computational model to simulate classical Persian poets' dynamics of influence.<n>We draw upon semantic, lexical, stylistic, thematic, and metrical features to demarcate each poet's corpus.<n>For typological insight, we use the Louvain community detection algorithm to demarcate clusters of poets sharing both style and theme coherence.
arXiv Detail & Related papers (2025-05-12T20:39:53Z)
A scale of conceptual orality and literacy: Automatic text categorization in the tradition of "Nähe und Distanz" [0.0]
It is stipulated that written texts can be rated on a scale of conceptual orality and literacy by linguistic features. This article establishes such a scale based on PCA and combines it with automatic analysis. The scale is also discussed with a view to its use in corpus compilation and as a guide for analyzes in larger corpora.
arXiv Detail & Related papers (2025-02-05T15:08:37Z)
Entropy and type-token ratio in gigaword corpora [0.0]
We investigate entropy and text-token ratio, two metrics for lexical diversities, in six massive linguistic datasets in English, Spanish, and Turkish. We find a functional relation between entropy and text-token ratio that holds across the corpora under consideration. Our results contribute to the theoretical understanding of text structure and offer practical implications for fields like natural language processing.
arXiv Detail & Related papers (2024-11-15T14:40:59Z)
Understanding Cross-Lingual Alignment -- A Survey [52.572071017877704]
Cross-lingual alignment is the meaningful similarity of representations across languages in multilingual language models. We survey the literature of techniques to improve cross-lingual alignment, providing a taxonomy of methods and summarising insights from throughout the field.
arXiv Detail & Related papers (2024-04-09T11:39:53Z)
Finding Pragmatic Differences Between Disciplines [14.587150614245123]
We learn a fixed set of domain-agnostic descriptors for document sections and "retrofit" the corpus to these descriptors. We analyze the position and ordering of these descriptors across documents to understand the relationship between discipline and structure. Our findings lay the foundation for future work in assessing research quality, domain style transfer, and further pragmatic analysis.
arXiv Detail & Related papers (2023-09-30T00:46:14Z)
A Corpus for Sentence-level Subjectivity Detection on English News Articles [49.49218203204942]
We use our guidelines to collect NewsSD-ENG, a corpus of 638 objective and 411 subjective sentences extracted from English news articles on controversial topics. Our corpus paves the way for subjectivity detection in English without relying on language-specific tools, such as lexicons or machine translation.
arXiv Detail & Related papers (2023-05-29T11:54:50Z)
Zero-shot Cross-Linguistic Learning of Event Semantics [27.997873309702225]
We look at captions of images across Arabic, Chinese, Farsi, German, Russian, and Turkish. We show that lexical aspects can be predicted for a given language despite not having observed any annotated data for this language at all.
arXiv Detail & Related papers (2022-07-05T23:18:36Z)
The Open corpus of the Veps and Karelian languages: overview and applications [52.77024349608834]
The Open Corpus of the Veps and Karelian Languages (VepKar) is an extension of the Veps created in 2009. The VepKar corpus comprises texts in Karelian and Veps, multifunctional dictionaries linked to them, and software with an advanced system of search. Future plans include developing a speech module for working with audio recordings and a syntactic tagging module using morphological analysis outputs.
arXiv Detail & Related papers (2022-06-08T13:05:50Z)
A Novel Corpus of Discourse Structure in Humans and Computers [55.74664144248097]
We present a novel corpus of 445 human- and computer-generated documents, comprising about 27,000 clauses. The corpus covers both formal and informal discourse, and contains documents generated using fine-tuned GPT-2.
arXiv Detail & Related papers (2021-11-10T20:56:08Z)
A Massively Multilingual Analysis of Cross-linguality in Shared Embedding Space [61.18554842370824]
In cross-lingual language models, representations for many different languages live in the same space. We compute a task-based measure of cross-lingual alignment in the form of bitext retrieval performance. We examine a range of linguistic, quasi-linguistic, and training-related features as potential predictors of these alignment metrics.
arXiv Detail & Related papers (2021-09-13T21:05:37Z)
The interplay between morphological typology and script on a novel multi-layer Algerian dialect corpus [4.974890682815778]
We introduce a newly annotated corpus of Algerian user-generated comments comprising parallel annotations of Algerian written in Latin, Arabic, and code-switched scripts. We find there is a delicate relationship between script and typology for part-of-speech, while sentiment analysis is less sensitive.
arXiv Detail & Related papers (2021-05-16T10:22:21Z)
Fine-Grained Analysis of Cross-Linguistic Syntactic Divergences [18.19093600136057]
We propose a framework for extracting divergence patterns for any language pair from a parallel corpus. We show that our framework provides a detailed picture of cross-language divergences, generalizes previous approaches, and lends itself to full automation.
arXiv Detail & Related papers (2020-05-07T13:05:03Z)
Evaluating Transformer-Based Multilingual Text Classification [55.53547556060537]
We argue that NLP tools perform unequally across languages with different syntactic and morphological structures. We calculate word order and morphological similarity indices to aid our empirical study.
arXiv Detail & Related papers (2020-04-29T03:34:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.