A Deep Learning Approach to Language-independent Gender Prediction on Twitter
- URL: http://arxiv.org/abs/2411.19733v1
- Date: Fri, 29 Nov 2024 14:26:34 GMT
- Title: A Deep Learning Approach to Language-independent Gender Prediction on Twitter
- Authors: Reyhaneh Hashempour, Barbara Plank, Aline Villavicencio, Renato Cordeiro de Amorim,
- Abstract summary: This work presents a set of experiments conducted to predict the gender of Twitter users based on language-independent features extracted from the text of the users' tweets.
The experiments were performed on a version of TwiSty dataset including tweets written by the users of six different languages: Portuguese, French, Dutch, English, German, and Italian.
- Score: 25.177698576477823
- License:
- Abstract: This work presents a set of experiments conducted to predict the gender of Twitter users based on language-independent features extracted from the text of the users' tweets. The experiments were performed on a version of TwiSty dataset including tweets written by the users of six different languages: Portuguese, French, Dutch, English, German, and Italian. Logistic regression (LR), and feed-forward neural networks (FFNN) with back-propagation were used to build models in two different settings: Inter-Lingual (IL) and Cross-Lingual (CL). In the IL setting, the training and testing were performed on the same language whereas in the CL, Italian and German datasets were set aside and only used as test sets and the rest were combined to compose training and development sets. In the IL, the highest accuracy score belongs to LR whereas in the CL, FFNN with three hidden layers yields the highest score. The results show that neural network based models underperform traditional models when the size of the training set is small; however, they beat traditional models by a non-trivial margin, when they are fed with large enough data. Finally, the feature analysis confirms that men and women have different writing styles independent of their language.
Related papers
- XFEVER: Exploring Fact Verification across Languages [40.1637899493061]
This paper introduces the Cross-lingual Fact Extraction and VERification dataset designed for benchmarking the fact verification models across different languages.
We constructed it by translating the claim and evidence texts of the Fact Extraction and VERification dataset into six languages.
The training and development sets were translated using machine translation, whereas the test set includes texts translated by professional translators and machine-translated texts.
arXiv Detail & Related papers (2023-10-25T01:20:17Z) - T3L: Translate-and-Test Transfer Learning for Cross-Lingual Text
Classification [50.675552118811]
Cross-lingual text classification is typically built on large-scale, multilingual language models (LMs) pretrained on a variety of languages of interest.
We propose revisiting the classic "translate-and-test" pipeline to neatly separate the translation and classification stages.
arXiv Detail & Related papers (2023-06-08T07:33:22Z) - Analyzing the Generalizability of Deep Contextualized Language
Representations For Text Classification [0.0]
This study evaluates the robustness of two state-of-the-art deep contextual language representations, ELMo and DistilBERT.
In the news classification task, the models are developed on local news from India and tested on the local news from China.
In the sentiment analysis task, the models are trained on movie reviews and tested on customer reviews.
arXiv Detail & Related papers (2023-03-22T22:31:09Z) - Ensemble Transfer Learning for Multilingual Coreference Resolution [60.409789753164944]
A problem that frequently occurs when working with a non-English language is the scarcity of annotated training data.
We design a simple but effective ensemble-based framework that combines various transfer learning techniques.
We also propose a low-cost TL method that bootstraps coreference resolution models by utilizing Wikipedia anchor texts.
arXiv Detail & Related papers (2023-01-22T18:22:55Z) - A Dual-Contrastive Framework for Low-Resource Cross-Lingual Named Entity
Recognition [5.030581940990434]
Cross-lingual Named Entity Recognition (NER) has recently become a research hotspot because it can alleviate the data-hungry problem for low-resource languages.
In this paper, we describe our novel dual-contrastive framework ConCNER for cross-lingual NER under the scenario of limited source-language labeled data.
arXiv Detail & Related papers (2022-04-02T07:59:13Z) - From Good to Best: Two-Stage Training for Cross-lingual Machine Reading
Comprehension [51.953428342923885]
We develop a two-stage approach to enhance the model performance.
The first stage targets at recall: we design a hard-learning (HL) algorithm to maximize the likelihood that the top-k predictions contain the accurate answer.
The second stage focuses on precision: an answer-aware contrastive learning mechanism is developed to learn the fine difference between the accurate answer and other candidates.
arXiv Detail & Related papers (2021-12-09T07:31:15Z) - Fake News Detection in Spanish Using Deep Learning Techniques [0.0]
This paper addresses the problem of fake news detection in Spanish using Machine Learning techniques.
It is fundamentally the same problem tackled for the English language.
There is not a significant amount of publicly available and adequately labeled fake news in Spanish to effectively train a Machine Learning model.
arXiv Detail & Related papers (2021-10-13T02:56:16Z) - Sentiment analysis in tweets: an assessment study from classical to
modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information.
Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks.
This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z) - Read Like Humans: Autonomous, Bidirectional and Iterative Language
Modeling for Scene Text Recognition [80.446770909975]
Linguistic knowledge is of great benefit to scene text recognition.
How to effectively model linguistic rules in end-to-end deep networks remains a research challenge.
We propose an autonomous, bidirectional and iterative ABINet for scene text recognition.
arXiv Detail & Related papers (2021-03-11T06:47:45Z) - Explicit Alignment Objectives for Multilingual Bidirectional Encoders [111.65322283420805]
We present a new method for learning multilingual encoders, AMBER (Aligned Multilingual Bi-directional EncodeR)
AMBER is trained on additional parallel data using two explicit alignment objectives that align the multilingual representations at different granularities.
Experimental results show that AMBER obtains gains of up to 1.1 average F1 score on sequence tagging and up to 27.3 average accuracy on retrieval over the XLMR-large model.
arXiv Detail & Related papers (2020-10-15T18:34:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.