Detecting Turkish Synonyms Used in Different Time Periods
- URL: http://arxiv.org/abs/2411.15768v1
- Date: Sun, 24 Nov 2024 09:31:38 GMT
- Title: Detecting Turkish Synonyms Used in Different Time Periods
- Authors: Umur Togay Yazar, Mucahid Kutlu,
- Abstract summary: Turkish is a prominent example of rapid linguistic transformation due to the language reform in the 20th century.
We propose two methods for detecting synonyms used in different time periods, focusing on Turkish.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Dynamic structure of languages poses significant challenges in applying natural language processing models on historical texts, causing decreased performance in various downstream tasks. Turkish is a prominent example of rapid linguistic transformation due to the language reform in the 20th century. In this paper, we propose two methods for detecting synonyms used in different time periods, focusing on Turkish. In our first method, we use Orthogonal Procrustes method to align the embedding spaces created using documents written in the corresponding time periods. In our second method, we extend the first one by incorporating Spearman's correlation between frequencies of words throughout the years. In our experiments, we show that our proposed methods outperform the baseline method. Furthermore, we observe that the efficacy of our methods remains consistent when the target time period shifts from the 1960s to the 1980s. However, their performance slightly decreases for subsequent time periods.
Related papers
- Language Detection by Means of the Minkowski Norm: Identification Through Character Bigrams and Frequency Analysis [0.0]
This research explores a mathematical implementation of an algorithm for language determinism by leveraging monograms and bigrams frequency rankings.<n>The method achieves over 80% accuracy on texts shorter than 150 characters and reaches 100% accuracy for longer texts.
arXiv Detail & Related papers (2025-07-22T07:11:01Z) - Fine-grained Controllable Text Generation through In-context Learning with Feedback [57.396980277089135]
We present a method for rewriting an input sentence to match specific values of nontrivial linguistic features, such as dependency depth.
In contrast to earlier work, our method uses in-context learning rather than finetuning, making it applicable in use cases where data is sparse.
arXiv Detail & Related papers (2024-06-17T08:55:48Z) - Reliable Detection and Quantification of Selective Forces in Language
Change [3.55026004901472]
We apply a recently-introduced method to corpus data to quantify the strength of selection in specific instances of historical language change.
We show that this method is more reliable and interpretable than similar methods that have previously been applied.
arXiv Detail & Related papers (2023-05-25T10:20:15Z) - Beyond Contrastive Learning: A Variational Generative Model for
Multilingual Retrieval [109.62363167257664]
We propose a generative model for learning multilingual text embeddings.
Our model operates on parallel data in $N$ languages.
We evaluate this method on a suite of tasks including semantic similarity, bitext mining, and cross-lingual question retrieval.
arXiv Detail & Related papers (2022-12-21T02:41:40Z) - Improving Temporal Generalization of Pre-trained Language Models with
Lexical Semantic Change [28.106524698188675]
Recent research has revealed that neural language models at scale suffer from poor temporal generalization capability.
We propose a simple yet effective lexical-level masking strategy to post-train a converged language model.
arXiv Detail & Related papers (2022-10-31T08:12:41Z) - Fake it Till You Make it: Self-Supervised Semantic Shifts for
Monolingual Word Embedding Tasks [58.87961226278285]
We propose a self-supervised approach to model lexical semantic change.
We show that our method can be used for the detection of semantic change with any alignment method.
We illustrate the utility of our techniques using experimental results on three different datasets.
arXiv Detail & Related papers (2021-01-30T18:59:43Z) - A Probabilistic Approach in Historical Linguistics Word Order Change in
Infinitival Clauses: from Latin to Old French [0.0]
This thesis investigates word order change in infinitival clauses in the history of Latin and Old French.
I examine a synchronic word order variation in each stage of language change, from which I infer the character, periodization and constraints of diachronic variation.
I present a three-stage probabilistic model of word order change, which also conforms to traditional language change patterns.
arXiv Detail & Related papers (2020-11-16T20:30:31Z) - Automated Transcription of Non-Latin Script Periodicals: A Case Study in
the Ottoman Turkish Print Archive [0.0]
Our study utilizes deep learning methods for the automated transcription of periodicals written in Arabic script Ottoman Turkish (OT) using the Transkribus platform.
We discuss the historical situation of OT text collections and how they were excluded for the most part from the late twentieth century corpora digitization.
This exclusion has two basic reasons: the technical challenges of OCR for Arabic script languages, and the rapid abandonment of that very script in the Turkish historical context.
arXiv Detail & Related papers (2020-11-02T17:28:36Z) - Learning Coupled Policies for Simultaneous Machine Translation using
Imitation Learning [85.70547744787]
We present an approach to efficiently learn a simultaneous translation model with coupled programmer-interpreter policies.
Experiments on six language-pairs show our method outperforms strong baselines in terms of translation quality.
arXiv Detail & Related papers (2020-02-11T10:56:42Z) - On the Importance of Word Order Information in Cross-lingual Sequence
Labeling [80.65425412067464]
Cross-lingual models that fit into the word order of the source language might fail to handle target languages.
We investigate whether making models insensitive to the word order of the source language can improve the adaptation performance in target languages.
arXiv Detail & Related papers (2020-01-30T03:35:44Z) - Robust Cross-lingual Embeddings from Parallel Sentences [65.85468628136927]
We propose a bilingual extension of the CBOW method which leverages sentence-aligned corpora to obtain robust cross-lingual word representations.
Our approach significantly improves crosslingual sentence retrieval performance over all other approaches.
It also achieves parity with a deep RNN method on a zero-shot cross-lingual document classification task.
arXiv Detail & Related papers (2019-12-28T16:18:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.