Developing a Comprehensive Framework for Sentiment Analysis in Turkish
- URL: http://arxiv.org/abs/2512.00515v1
- Date: Sat, 29 Nov 2025 15:14:57 GMT
- Title: Developing a Comprehensive Framework for Sentiment Analysis in Turkish
- Authors: Cem Rifki Aydin,
- Abstract summary: This thesis can be considered the most detailed and comprehensive study made on sentiment analysis in Turkish as of July, 2020.<n>We developed a comprehensive framework for sentiment analysis that takes its many aspects into account mainly for Turkish.<n>We built novel word embeddings that exploit sentiment, syntactic, semantic, and lexical characteristics for both Turkish and English.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this thesis, we developed a comprehensive framework for sentiment analysis that takes its many aspects into account mainly for Turkish. We have also proposed several approaches specific to sentiment analysis in English only. We have accordingly made five major and three minor contributions. We generated a novel and effective feature set by combining unsupervised, semi-supervised, and supervised metrics. We then fed them as input into classical machine learning methods, and outperformed neural network models for datasets of different genres in both Turkish and English. We created a polarity lexicon with a semi-supervised domain-specific method, which has been the first approach applied for corpora in Turkish. We performed a fine morphological analysis for the sentiment classification task in Turkish by determining the polarities of morphemes. This can be adapted to other morphologically-rich or agglutinative languages as well. We have built a novel neural network architecture, which combines recurrent and recursive neural network models for English. We built novel word embeddings that exploit sentiment, syntactic, semantic, and lexical characteristics for both Turkish and English. We also redefined context windows as subclauses in modelling word representations in English. This can also be applied to other linguistic fields and natural language processing tasks. We have achieved state-of-the-art and significant results for all these original approaches. Our minor contributions include methods related to aspect-based sentiment in Turkish, parameter redefinition in the semi-supervised approach, and aspect term extraction techniques for English. This thesis can be considered the most detailed and comprehensive study made on sentiment analysis in Turkish as of July, 2020. Our work has also contributed to the opinion classification problem in English.
Related papers
- RusLICA: A Russian-Language Platform for Automated Linguistic Inquiry and Category Analysis [0.0]
The paper describes the process of mapping lemmas to 42 psycholinguistic categories and the implementation of the analyzer as part of RusLICA web service.
arXiv Detail & Related papers (2026-01-28T05:43:40Z) - Cetvel: A Unified Benchmark for Evaluating Language Understanding, Generation and Cultural Capacity of LLMs for Turkish [9.111556632499472]
Cetvel is a benchmark designed to evaluate large language models (LLMs) in Turkish.<n>It combines a broad range of both discriminative and generative tasks ensuring content that reflects the linguistic and cultural richness of Turkish language.
arXiv Detail & Related papers (2025-08-22T14:42:50Z) - Evaluating Contextualized Representations of (Spanish) Ambiguous Words: A New Lexical Resource and Empirical Analysis [2.2530496464901106]
We evaluate semantic representations of Spanish ambiguous nouns in context in a suite of Spanish-language monolingual and multilingual BERT-based models.<n>We find that various BERT-based LMs' contextualized semantic representations capture some variance in human judgments but fall short of the human benchmark.
arXiv Detail & Related papers (2024-06-20T18:58:11Z) - A Comprehensive Analysis of Static Word Embeddings for Turkish [0.058520770038704165]
There are basically two types of word embedding models which are non-contextual (static) models and contextual models.
We compare and evaluate the performance of several contextual and non-contextual models in both intrinsic and extrinsic evaluation settings for Turkish.
The results of the analyses provide insights about the suitability of different embedding models in different types of NLP tasks.
arXiv Detail & Related papers (2024-05-13T14:23:37Z) - Holmes: A Benchmark to Assess the Linguistic Competence of Language Models [59.627729608055006]
We introduce Holmes, a new benchmark designed to assess language models (LMs) linguistic competence.
We use computation-based probing to examine LMs' internal representations regarding distinct linguistic phenomena.
As a result, we meet recent calls to disentangle LMs' linguistic competence from other cognitive abilities.
arXiv Detail & Related papers (2024-04-29T17:58:36Z) - Understanding Cross-Lingual Alignment -- A Survey [52.572071017877704]
Cross-lingual alignment is the meaningful similarity of representations across languages in multilingual language models.
We survey the literature of techniques to improve cross-lingual alignment, providing a taxonomy of methods and summarising insights from throughout the field.
arXiv Detail & Related papers (2024-04-09T11:39:53Z) - A Corpus for Sentence-level Subjectivity Detection on English News Articles [49.49218203204942]
We use our guidelines to collect NewsSD-ENG, a corpus of 638 objective and 411 subjective sentences extracted from English news articles on controversial topics.
Our corpus paves the way for subjectivity detection in English without relying on language-specific tools, such as lexicons or machine translation.
arXiv Detail & Related papers (2023-05-29T11:54:50Z) - Variational Cross-Graph Reasoning and Adaptive Structured Semantics
Learning for Compositional Temporal Grounding [143.5927158318524]
Temporal grounding is the task of locating a specific segment from an untrimmed video according to a query sentence.
We introduce a new Compositional Temporal Grounding task and construct two new dataset splits.
We argue that the inherent structured semantics inside the videos and language is the crucial factor to achieve compositional generalization.
arXiv Detail & Related papers (2023-01-22T08:02:23Z) - Quantifying Synthesis and Fusion and their Impact on Machine Translation [79.61874492642691]
In Natural Language Processing (NLP) typically labels a whole language with a strict type of morphology, e.g. fusional or agglutinative.
In this work, we propose to reduce the rigidity of such claims, by quantifying morphological typology at the word and segment level.
For computing literature, we test unsupervised and supervised morphological segmentation methods for English, German and Turkish, whereas for fusion, we propose a semi-automatic method using Spanish as a case study.
Then, we analyse the relationship between machine translation quality and the degree of synthesis and fusion at word (nouns and verbs for English-Turkish,
arXiv Detail & Related papers (2022-05-06T17:04:58Z) - AUTOLEX: An Automatic Framework for Linguistic Exploration [93.89709486642666]
We propose an automatic framework that aims to ease linguists' discovery and extraction of concise descriptions of linguistic phenomena.
Specifically, we apply this framework to extract descriptions for three phenomena: morphological agreement, case marking, and word order.
We evaluate the descriptions with the help of language experts and propose a method for automated evaluation when human evaluation is infeasible.
arXiv Detail & Related papers (2022-03-25T20:37:30Z) - Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models.
We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z) - Sentiment Analysis with Contextual Embeddings and Self-Attention [3.0079490585515343]
In natural language the intended meaning of a word or phrase is often implicit and depends on the context.
We propose a simple yet effective method for sentiment analysis using contextual embeddings and a self-attention mechanism.
The experimental results for three languages, including morphologically rich Polish and German, show that our model is comparable to or even outperforms state-of-the-art models.
arXiv Detail & Related papers (2020-03-12T02:19:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.