Challenges Encountered in Turkish Natural Language Processing Studies
- URL: http://arxiv.org/abs/2101.11436v1
- Date: Thu, 21 Jan 2021 08:30:33 GMT
- Title: Challenges Encountered in Turkish Natural Language Processing Studies
- Authors: Kadir Tohma, Yakup Kutlu
- Abstract summary: Natural language processing is a branch of computer science that combines artificial intelligence with linguistics.
In this study, the interesting features of Turkish in terms of natural language processing are mentioned.
- Score: 1.52292571922932
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Natural language processing is a branch of computer science that combines
artificial intelligence with linguistics. It aims to analyze a language element
such as writing or speaking with software and convert it into information.
Considering that each language has its own grammatical rules and vocabulary
diversity, the complexity of the studies in this field is somewhat
understandable. For instance, Turkish is a very interesting language in many
ways. Examples of this are agglutinative word structure, consonant/vowel
harmony, a large number of productive derivational morphemes (practically
infinite vocabulary), derivation and syntactic relations, a complex emphasis on
vocabulary and phonological rules. In this study, the interesting features of
Turkish in terms of natural language processing are mentioned. In addition,
summary info about natural language processing techniques, systems and various
sources developed for Turkish are given.
Related papers
- Analyzing The Language of Visual Tokens [48.62180485759458]
We take a natural-language-centric approach to analyzing discrete visual languages.
We show that higher token innovation drives greater entropy and lower compression, with tokens predominantly representing object parts.
We also show that visual languages lack cohesive grammatical structures, leading to higher perplexity and weaker hierarchical organization compared to natural languages.
arXiv Detail & Related papers (2024-11-07T18:59:28Z) - Linguistic Structure from a Bottleneck on Sequential Information Processing [5.850665541267672]
We show that natural-language-like systematicity arises in codes that are constrained by predictive information.
We show that human languages are structured to have low predictive information at the levels of phonology, morphology, syntax, and semantics.
arXiv Detail & Related papers (2024-05-20T15:25:18Z) - Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning [84.12154024070024]
We propose natural language embedded programs (NLEP) as a unifying framework for addressing math/symbolic reasoning, natural language understanding, and instruction following tasks.
Our approach prompts a language model to generate full Python programs that define functions over data structures which contain natural language representations of structured knowledge.
A Python interpreter then executes the generated code and prints the output.
arXiv Detail & Related papers (2023-09-19T17:54:21Z) - Linguistic Analysis using Paninian System of Sounds and Finite State Machines [0.0]
The study of spoken languages comprises phonology, morphology, and grammar.
The languages can be classified as root languages, inflectional languages, and stem languages.
All these factors lead to the formation of vocabulary which has commonality/similarity as well as distinct and subtle differences across languages.
arXiv Detail & Related papers (2023-01-29T15:22:10Z) - Categorical Tools for Natural Language Processing [0.0]
This thesis develops the translation between category theory and computational linguistics.
The three chapters deal with syntax, semantics and pragmatics.
The resulting functorial models can be composed to form games where equilibria are the solutions of language processing tasks.
arXiv Detail & Related papers (2022-12-13T15:12:37Z) - Transparency Helps Reveal When Language Models Learn Meaning [71.96920839263457]
Our systematic experiments with synthetic data reveal that, with languages where all expressions have context-independent denotations, both autoregressive and masked language models learn to emulate semantic relations between expressions.
Turning to natural language, our experiments with a specific phenomenon -- referential opacity -- add to the growing body of evidence that current language models do not well-represent natural language semantics.
arXiv Detail & Related papers (2022-10-14T02:35:19Z) - AUTOLEX: An Automatic Framework for Linguistic Exploration [93.89709486642666]
We propose an automatic framework that aims to ease linguists' discovery and extraction of concise descriptions of linguistic phenomena.
Specifically, we apply this framework to extract descriptions for three phenomena: morphological agreement, case marking, and word order.
We evaluate the descriptions with the help of language experts and propose a method for automated evaluation when human evaluation is infeasible.
arXiv Detail & Related papers (2022-03-25T20:37:30Z) - Linking Emergent and Natural Languages via Corpus Transfer [98.98724497178247]
We propose a novel way to establish a link by corpus transfer between emergent languages and natural languages.
Our approach showcases non-trivial transfer benefits for two different tasks -- language modeling and image captioning.
We also introduce a novel metric to predict the transferability of an emergent language by translating emergent messages to natural language captions grounded on the same images.
arXiv Detail & Related papers (2022-03-24T21:24:54Z) - Machine learning approach of Japanese composition scoring and writing
aided system's design [0.0]
A composition scoring system can greatly assist language learners.
It can make language leaner improve themselves in the process of output something.
Especially for foreign language learners, lexical and syntactic content are usually what they are more concerned about.
arXiv Detail & Related papers (2020-08-26T11:01:13Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z) - Learning Music Helps You Read: Using Transfer to Study Linguistic
Structure in Language Models [27.91397366776451]
Training LSTMs on latent structure (MIDI music or Java code) improves test performance on natural language.
Experiments on transfer between natural languages controlling for vocabulary overlap show that zero-shot performance on a test language is highly correlated with typological similarity to the training language.
arXiv Detail & Related papers (2020-04-30T06:24:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.