Comprehending Lexical and Affective Ontologies in the Demographically
Diverse Spatial Social Media Discourse
- URL: http://arxiv.org/abs/2311.06729v1
- Date: Sun, 12 Nov 2023 04:23:33 GMT
- Title: Comprehending Lexical and Affective Ontologies in the Demographically
Diverse Spatial Social Media Discourse
- Authors: Salim Sazzed
- Abstract summary: This study aims to comprehend linguistic and socio-demographic features, encompassing English language styles, conveyed sentiments, and lexical diversity within social media data.
Our analysis entails the extraction and examination of various statistical, grammatical, and sentimental features from two groups.
Our investigation unveils substantial disparities in certain linguistic attributes between the two groups, yielding a macro F1 score of approximately 0.85.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This study aims to comprehend linguistic and socio-demographic features,
encompassing English language styles, conveyed sentiments, and lexical
diversity within spatial online social media review data. To this end, we
undertake a case study that scrutinizes reviews composed by two distinct and
demographically diverse groups. Our analysis entails the extraction and
examination of various statistical, grammatical, and sentimental features from
these two groups. Subsequently, we leverage these features with machine
learning (ML) classifiers to discern their potential in effectively
differentiating between the groups. Our investigation unveils substantial
disparities in certain linguistic attributes between the two groups. When
integrated into ML classifiers, these attributes exhibit a marked efficacy in
distinguishing the groups, yielding a macro F1 score of approximately 0.85.
Furthermore, we conduct a comparative evaluation of these linguistic features
with word n-gram-based lexical features in discerning demographically diverse
review data. As expected, the n-gram lexical features, coupled with fine-tuned
transformer-based models, show superior performance, attaining accuracies
surpassing 95\% and macro F1 scores exceeding 0.96. Our meticulous analysis and
comprehensive evaluations substantiate the efficacy of linguistic and
sentimental features in effectively discerning demographically diverse review
data. The findings of this study provide valuable guidelines for future
research endeavors concerning the analysis of demographic patterns in textual
content across various social media platforms.
Related papers
- A study of Vietnamese readability assessing through semantic and statistical features [0.0]
This paper introduces a new approach that integrates statistical and semantic approaches to assessing text readability.
Our research utilized three distinct datasets: the Vietnamese Text Readability dataset (ViRead), OneStopEnglish, and RACE.
We conducted experiments using various machine learning models, including Support Vector Machine (SVM), Random Forest, and Extra Trees.
arXiv Detail & Related papers (2024-11-07T14:54:42Z) - Persian Homograph Disambiguation: Leveraging ParsBERT for Enhanced Sentence Understanding with a Novel Word Disambiguation Dataset [0.0]
We introduce a novel dataset tailored for Persian homograph disambiguation.
Our work encompasses a thorough exploration of various embeddings, evaluated through the cosine similarity method.
We scrutinize the models' performance in terms of Accuracy, Recall, and F1 Score.
arXiv Detail & Related papers (2024-05-24T14:56:36Z) - Sensitivity, Performance, Robustness: Deconstructing the Effect of
Sociodemographic Prompting [64.80538055623842]
sociodemographic prompting is a technique that steers the output of prompt-based models towards answers that humans with specific sociodemographic profiles would give.
We show that sociodemographic information affects model predictions and can be beneficial for improving zero-shot learning in subjective NLP tasks.
arXiv Detail & Related papers (2023-09-13T15:42:06Z) - Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language
Modelling [70.23876429382969]
We propose a benchmark that can evaluate intra-sentence discourse properties across a diverse set of NLP tasks.
Disco-Bench consists of 9 document-level testsets in the literature domain, which contain rich discourse phenomena.
For linguistic analysis, we also design a diagnostic test suite that can examine whether the target models learn discourse knowledge.
arXiv Detail & Related papers (2023-07-16T15:18:25Z) - Language identification as improvement for lip-based biometric visual
systems [13.205817167773443]
We present a preliminary study in which we use linguistic information as a soft biometric trait to enhance the performance of a visual (auditory-free) identification system based on lip movement.
We report a significant improvement in the identification performance of the proposed visual system as a result of the integration of these data.
arXiv Detail & Related papers (2023-02-27T15:44:24Z) - Investigating Fairness Disparities in Peer Review: A Language Model
Enhanced Approach [77.61131357420201]
We conduct a thorough and rigorous study on fairness disparities in peer review with the help of large language models (LMs)
We collect, assemble, and maintain a comprehensive relational database for the International Conference on Learning Representations (ICLR) conference from 2017 to date.
We postulate and study fairness disparities on multiple protective attributes of interest, including author gender, geography, author, and institutional prestige.
arXiv Detail & Related papers (2022-11-07T16:19:42Z) - Under the Morphosyntactic Lens: A Multifaceted Evaluation of Gender Bias
in Speech Translation [20.39599469927542]
Gender bias is largely recognized as a problematic phenomenon affecting language technologies.
Most of current evaluation practices adopt a word-level focus on a narrow set of occupational nouns under synthetic conditions.
Such protocols overlook key features of grammatical gender languages, which are characterized by morphosyntactic chains of gender agreement.
arXiv Detail & Related papers (2022-03-18T11:14:16Z) - Automated Speech Scoring System Under The Lens: Evaluating and
interpreting the linguistic cues for language proficiency [26.70127591966917]
We utilize classical machine learning models to formulate a speech scoring task as both a classification and a regression problem.
First, we extract linguist features under five categories (fluency, pronunciation, content, grammar and vocabulary, and acoustic) and train models to grade responses.
In comparison, we find that the regression-based models perform equivalent to or better than the classification approach.
arXiv Detail & Related papers (2021-11-30T06:28:58Z) - A Massively Multilingual Analysis of Cross-linguality in Shared
Embedding Space [61.18554842370824]
In cross-lingual language models, representations for many different languages live in the same space.
We compute a task-based measure of cross-lingual alignment in the form of bitext retrieval performance.
We examine a range of linguistic, quasi-linguistic, and training-related features as potential predictors of these alignment metrics.
arXiv Detail & Related papers (2021-09-13T21:05:37Z) - AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages
with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context.
It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts.
Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z) - Evaluating Transformer-Based Multilingual Text Classification [55.53547556060537]
We argue that NLP tools perform unequally across languages with different syntactic and morphological structures.
We calculate word order and morphological similarity indices to aid our empirical study.
arXiv Detail & Related papers (2020-04-29T03:34:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.