Exploring Robustness of LLMs to Paraphrasing Based on Sociodemographic Factors
- URL: http://arxiv.org/abs/2501.08276v2
- Date: Fri, 04 Jul 2025 15:35:01 GMT
- Title: Exploring Robustness of LLMs to Paraphrasing Based on Sociodemographic Factors
- Authors: Pulkit Arora, Akbar Karimi, Lucie Flek,
- Abstract summary: We extend the SocialIQA dataset to create diverse paraphrased sets conditioned on sociodemographic factors.<n>We find that demographic-based paraphrasing significantly impacts the performance of language models.
- Score: 7.312170216336085
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite their linguistic prowess, LLMs have been shown to be vulnerable to small input perturbations. While robustness to local adversarial changes has been studied, robustness to global modifications such as different linguistic styles remains underexplored. Therefore, we take a broader approach to explore a wider range of variations across sociodemographic dimensions. We extend the SocialIQA dataset to create diverse paraphrased sets conditioned on sociodemographic factors (age and gender). The assessment aims to provide a deeper understanding of LLMs in (a) their capability of generating demographic paraphrases with engineered prompts and (b) their capabilities in interpreting real-world, complex language scenarios. We also perform a reliability analysis of the generated paraphrases looking into linguistic diversity and perplexity as well as manual evaluation. We find that demographic-based paraphrasing significantly impacts the performance of language models, indicating that the subtleties of linguistic variation remain a significant challenge. We will make the code and dataset available for future research.
Related papers
- IMPACT: Inflectional Morphology Probes Across Complex Typologies [0.0]
IMPACT is a synthetically generated evaluation framework focused on inflectional morphology.<n>It is designed to evaluate performance across five morphologically rich languages: Arabic, Russian, Finnish, Turkish, and Hebrew.<n>We assess eight multilingual LLMs that, despite strong English performance, struggle with other languages and uncommon morphological patterns.
arXiv Detail & Related papers (2025-06-30T14:58:23Z) - Neighbors and relatives: How do speech embeddings reflect linguistic connections across the world? [0.7168794329741259]
This study employs embeddings from the fine-tuned XLS-R self-supervised language identification model vox107-xls-r-300m-wav2vec to analyze relationships between 106 world languages.<n>Using linear discriminant analysis (LDA), language embeddings are clustered and compared with genealogical, lexical, and geographical distances.<n>The results demonstrate that embedding-based distances align closely with traditional measures, effectively capturing both global and local typological patterns.
arXiv Detail & Related papers (2025-06-10T08:33:34Z) - An Empirical Study of Federated Prompt Learning for Vision Language Model [50.73746120012352]
This paper systematically investigates behavioral differences between language prompt learning and vision prompt learning.<n>We conduct experiments to evaluate the impact of various fl and prompt configurations, such as client scale, aggregation strategies, and prompt length.<n>We explore strategies for enhancing prompt learning in complex scenarios where label skew and domain shift coexist.
arXiv Detail & Related papers (2025-05-29T03:09:15Z) - Mechanistic Understanding and Mitigation of Language Confusion in English-Centric Large Language Models [49.09746599881631]
We present the first mechanistic interpretability study of language confusion.<n>We show that confusion points (CPs) are central to this phenomenon.<n>We show that editing a small set of critical neurons, identified via comparative analysis with multilingual-tuned models, substantially mitigates confusion.
arXiv Detail & Related papers (2025-05-22T11:29:17Z) - When Less Language is More: Language-Reasoning Disentanglement Makes LLMs Better Multilingual Reasoners [111.50503126693444]
We show that language-specific ablation consistently boosts multilingual reasoning performance.<n>Compared to post-training, our training-free ablation achieves comparable or superior results with minimal computational overhead.
arXiv Detail & Related papers (2025-05-21T08:35:05Z) - Disambiguation in Conversational Question Answering in the Era of LLM: A Survey [36.37587894344511]
Ambiguity remains a fundamental challenge in Natural Language Processing (NLP)<n>With the advent of Large Language Models (LLMs), addressing ambiguity has become even more critical due to their expanded capabilities and applications.<n>This paper explores the definition, forms, and implications of ambiguity for language driven systems.
arXiv Detail & Related papers (2025-05-18T20:53:41Z) - LINGOLY-TOO: Disentangling Memorisation from Reasoning with Linguistic Templatisation and Orthographic Obfuscation [1.2576388595811496]
We introduce a framework for producing linguistic reasoning problems that reduces the effect of memorisation in model performance estimates.
We apply this framework to develop LINGOLY-TOO, a challenging benchmark for linguistic reasoning.
arXiv Detail & Related papers (2025-03-04T19:57:47Z) - Sparse Auto-Encoder Interprets Linguistic Features in Large Language Models [40.12943080113246]
We present a systematic and comprehensive causal investigation using sparse auto-encoders (SAEs)
We extract a wide range of linguistic features from six dimensions.
We introduce two indices-Feature Representation Confidence (FRC) and Feature Intervention Confidence (FIC)-to measure the ability of linguistic features to capture and control linguistic phenomena.
arXiv Detail & Related papers (2025-02-27T18:16:47Z) - An Overview of Large Language Models for Statisticians [109.38601458831545]
Large Language Models (LLMs) have emerged as transformative tools in artificial intelligence (AI)<n>This paper explores potential areas where statisticians can make important contributions to the development of LLMs.<n>We focus on issues such as uncertainty quantification, interpretability, fairness, privacy, watermarking and model adaptation.
arXiv Detail & Related papers (2025-02-25T03:40:36Z) - Benchmarking Linguistic Diversity of Large Language Models [14.824871604671467]
This paper emphasizes the importance of examining the preservation of human linguistic richness by language models.<n>We propose a comprehensive framework for evaluating LLMs from various linguistic diversity perspectives.
arXiv Detail & Related papers (2024-12-13T16:46:03Z) - Hate Personified: Investigating the role of LLMs in content moderation [64.26243779985393]
For subjective tasks such as hate detection, where people perceive hate differently, the Large Language Model's (LLM) ability to represent diverse groups is unclear.
By including additional context in prompts, we analyze LLM's sensitivity to geographical priming, persona attributes, and numerical information to assess how well the needs of various groups are reflected.
arXiv Detail & Related papers (2024-10-03T16:43:17Z) - Boosting the Capabilities of Compact Models in Low-Data Contexts with Large Language Models and Retrieval-Augmented Generation [2.9921619703037274]
We propose a retrieval augmented generation (RAG) framework backed by a large language model (LLM) to correct the output of a smaller model for the linguistic task of morphological glossing.
We leverage linguistic information to make up for the lack of data and trainable parameters, while allowing for inputs from written descriptive grammars interpreted and distilled through an LLM.
We show that a compact, RAG-supported model is highly effective in data-scarce settings, achieving a new state-of-the-art for this task and our target languages.
arXiv Detail & Related papers (2024-10-01T04:20:14Z) - LangSuitE: Planning, Controlling and Interacting with Large Language Models in Embodied Text Environments [70.91258869156353]
We introduce LangSuitE, a versatile and simulation-free testbed featuring 6 representative embodied tasks in textual embodied worlds.
Compared with previous LLM-based testbeds, LangSuitE offers adaptability to diverse environments without multiple simulation engines.
We devise a novel chain-of-thought (CoT) schema, EmMem, which summarizes embodied states w.r.t. history information.
arXiv Detail & Related papers (2024-06-24T03:36:29Z) - A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers [51.8203871494146]
The rapid development of Large Language Models (LLMs) demonstrates remarkable multilingual capabilities in natural language processing.
Despite the breakthroughs of LLMs, the investigation into the multilingual scenario remains insufficient.
This survey aims to help the research community address multilingual problems and provide a comprehensive understanding of the core concepts, key techniques, and latest developments in multilingual natural language processing based on LLMs.
arXiv Detail & Related papers (2024-05-17T17:47:39Z) - The Power of Question Translation Training in Multilingual Reasoning: Broadened Scope and Deepened Insights [108.40766216456413]
We propose a question alignment framework to bridge the gap between large language models' English and non-English performance.
Experiment results show it can boost multilingual performance across diverse reasoning scenarios, model families, and sizes.
We analyze representation space, generated response and data scales, and reveal how question translation training strengthens language alignment within LLMs.
arXiv Detail & Related papers (2024-05-02T14:49:50Z) - L2CEval: Evaluating Language-to-Code Generation Capabilities of Large
Language Models [102.00201523306986]
We present L2CEval, a systematic evaluation of the language-to-code generation capabilities of large language models (LLMs)
We analyze the factors that potentially affect their performance, such as model size, pretraining data, instruction tuning, and different prompting methods.
In addition to assessing model performance, we measure confidence calibration for the models and conduct human evaluations of the output programs.
arXiv Detail & Related papers (2023-09-29T17:57:00Z) - Improving Factuality and Reasoning in Language Models through Multiagent
Debate [95.10641301155232]
We present a complementary approach to improve language responses where multiple language model instances propose and debate their individual responses and reasoning processes over multiple rounds to arrive at a common final answer.
Our findings indicate that this approach significantly enhances mathematical and strategic reasoning across a number of tasks.
Our approach may be directly applied to existing black-box models and uses identical procedure and prompts for all tasks we investigate.
arXiv Detail & Related papers (2023-05-23T17:55:11Z) - A Survey of Large Language Models [81.06947636926638]
Language modeling has been widely studied for language understanding and generation in the past two decades.
Recently, pre-trained language models (PLMs) have been proposed by pre-training Transformer models over large-scale corpora.
To discriminate the difference in parameter scale, the research community has coined the term large language models (LLM) for the PLMs of significant size.
arXiv Detail & Related papers (2023-03-31T17:28:46Z) - Competence-Based Analysis of Language Models [21.43498764977656]
CALM (Competence-based Analysis of Language Models) is designed to investigate LLM competence in the context of specific tasks.<n>We develop a new approach for performing causal probing interventions using gradient-based adversarial attacks.<n>We carry out a case study of CALM using these interventions to analyze and compare LLM competence across a variety of lexical inference tasks.
arXiv Detail & Related papers (2023-03-01T08:53:36Z) - Emergent Linguistic Structures in Neural Networks are Fragile [20.692540987792732]
Large Language Models (LLMs) have been reported to have strong performance on natural language processing tasks.
We propose a framework to assess the consistency and robustness of linguistic representations.
arXiv Detail & Related papers (2022-10-31T15:43:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.