Stop! In the Name of Flaws: Disentangling Personal Names and Sociodemographic Attributes in NLP
- URL: http://arxiv.org/abs/2405.17159v2
- Date: Mon, 15 Jul 2024 13:57:56 GMT
- Title: Stop! In the Name of Flaws: Disentangling Personal Names and Sociodemographic Attributes in NLP
- Authors: Vagrant Gautam, Arjun Subramonian, Anne Lauscher, Os Keyes,
- Abstract summary: We present an interdisciplinary background on names and naming.
We then survey the issues inherent to associating names with sociodemographic attributes.
We provide guiding questions along with normative recommendations to avoid validity and ethical pitfalls.
- Score: 17.738887765065396
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Personal names simultaneously differentiate individuals and categorize them in ways that are important in a given society. While the natural language processing community has thus associated personal names with sociodemographic characteristics in a variety of tasks, researchers have engaged to varying degrees with the established methodological problems in doing so. To guide future work that uses names and sociodemographic characteristics, we provide an overview of relevant research: first, we present an interdisciplinary background on names and naming. We then survey the issues inherent to associating names with sociodemographic attributes, covering problems of validity (e.g., systematic error, construct validity), as well as ethical concerns (e.g., harms, differential impact, cultural insensitivity). Finally, we provide guiding questions along with normative recommendations to avoid validity and ethical pitfalls when dealing with names and sociodemographic characteristics in natural language processing.
Related papers
- Revealing Personality Traits: A New Benchmark Dataset for Explainable Personality Recognition on Dialogues [63.936654900356004]
Personality recognition aims to identify the personality traits implied in user data such as dialogues and social media posts.
We propose a novel task named Explainable Personality Recognition, aiming to reveal the reasoning process as supporting evidence of the personality trait.
arXiv Detail & Related papers (2024-09-29T14:41:43Z) - Multicultural Name Recognition For Previously Unseen Names [65.268245109828]
This paper attempts to improve recognition of person names, a diverse category that can grow any time someone is born or changes their name.
I look at names from 103 countries to compare how well the model performs on names from different cultures.
I find that a model with combined character and word input outperforms word-only models and may improve on accuracy compared to classical NER models.
arXiv Detail & Related papers (2024-01-23T17:58:38Z) - PsyCoT: Psychological Questionnaire as Powerful Chain-of-Thought for
Personality Detection [50.66968526809069]
We propose a novel personality detection method, called PsyCoT, which mimics the way individuals complete psychological questionnaires in a multi-turn dialogue manner.
Our experiments demonstrate that PsyCoT significantly improves the performance and robustness of GPT-3.5 in personality detection.
arXiv Detail & Related papers (2023-10-31T08:23:33Z) - Editing Personality for Large Language Models [73.59001811199823]
This paper introduces an innovative task focused on editing the personality traits of Large Language Models (LLMs)
We construct PersonalityEdit, a new benchmark dataset to address this task.
arXiv Detail & Related papers (2023-10-03T16:02:36Z) - Sensitivity, Performance, Robustness: Deconstructing the Effect of
Sociodemographic Prompting [64.80538055623842]
sociodemographic prompting is a technique that steers the output of prompt-based models towards answers that humans with specific sociodemographic profiles would give.
We show that sociodemographic information affects model predictions and can be beneficial for improving zero-shot learning in subjective NLP tasks.
arXiv Detail & Related papers (2023-09-13T15:42:06Z) - How word semantics and phonology affect handwriting of Alzheimer's
patients: a machine learning based analysis [20.36565712578267]
We investigated how word semantics and phonology affect the handwriting of people affected by Alzheimer's disease.
We used the data from six handwriting tasks, each requiring copying a word belonging to one of the following categories.
The experimental results showed that the feature selection allowed us to derive a different set of highly distinctive features for each word type.
arXiv Detail & Related papers (2023-07-06T13:35:06Z) - Nichelle and Nancy: The Influence of Demographic Attributes and
Tokenization Length on First Name Biases [12.459949725707315]
We find that demographic attributes of a name (race, ethnicity, and gender) and name tokenization length are both factors that systematically affect the behavior of social commonsense reasoning models.
arXiv Detail & Related papers (2023-05-26T01:57:42Z) - In the Name of Fairness: Assessing the Bias in Clinical Record
De-identification [11.794861201300826]
We investigate the bias of de-identification systems on names in clinical notes via a large-scale empirical analysis.
Our findings reveal that there are statistically significant performance gaps along a majority of the demographic dimensions in most methods.
To mitigate the identified gaps, we propose a simple and method-agnostic solution by fine-tuning de-identification methods with clinical context and diverse names.
arXiv Detail & Related papers (2023-05-18T23:47:00Z) - Differentially Private and Fair Deep Learning: A Lagrangian Dual
Approach [54.32266555843765]
This paper studies a model that protects the privacy of the individuals sensitive information while also allowing it to learn non-discriminatory predictors.
The method relies on the notion of differential privacy and the use of Lagrangian duality to design neural networks that can accommodate fairness constraints.
arXiv Detail & Related papers (2020-09-26T10:50:33Z) - Assessing Demographic Bias in Named Entity Recognition [0.21485350418225244]
We assess the bias in Named Entity Recognition systems for English across different demographic groups with synthetically generated corpora.
Character-based contextualized word representation models such as ELMo results in the least bias across demographics.
arXiv Detail & Related papers (2020-08-08T02:01:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.