Standard Language Ideology in AI-Generated Language
- URL: http://arxiv.org/abs/2406.08726v1
- Date: Thu, 13 Jun 2024 01:08:40 GMT
- Title: Standard Language Ideology in AI-Generated Language
- Authors: Genevieve Smith, Eve Fleisig, Madeline Bossi, Ishita Rustagi, Xavier Yin,
- Abstract summary: We explore standard language ideology in language generated by large language models (LLMs)
We introduce the concept of standard AI-generated language ideology, the process by which AI-generated language regards Standard American English (SAE) as a linguistic default and reinforces a linguistic bias that SAE is the most "appropriate" language.
- Score: 1.2815904071470705
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this position paper, we explore standard language ideology in language generated by large language models (LLMs). First, we outline how standard language ideology is reflected and reinforced in LLMs. We then present a taxonomy of open problems regarding standard language ideology in AI-generated language with implications for minoritized language communities. We introduce the concept of standard AI-generated language ideology, the process by which AI-generated language regards Standard American English (SAE) as a linguistic default and reinforces a linguistic bias that SAE is the most "appropriate" language. Finally, we discuss tensions that remain, including reflecting on what desirable system behavior looks like, as well as advantages and drawbacks of generative AI tools imitating--or often not--different English language varieties. Throughout, we discuss standard language ideology as a manifestation of existing global power structures in and through AI-generated language before ending with questions to move towards alternative, more emancipatory digital futures.
Related papers
- Generative AI, Pragmatics, and Authenticity in Second Language Learning [0.0]
There are obvious benefits to integrating generative AI (artificial intelligence) into language learning and teaching.
However, due to how AI systems under-stand human language, they lack the lived experience to be able to use language with the same social awareness as humans.
There are built-in linguistic and cultural biases based on their training data which is mostly in English and predominantly from Western sources.
arXiv Detail & Related papers (2024-10-18T11:58:03Z) - What Do Dialect Speakers Want? A Survey of Attitudes Towards Language Technology for German Dialects [60.8361859783634]
We survey speakers of dialects and regional languages related to German.
We find that respondents are especially in favour of potential NLP tools that work with dialectal input.
arXiv Detail & Related papers (2024-02-19T09:15:28Z) - Towards Bridging the Digital Language Divide [4.234367850767171]
multilingual language processing systems often exhibit a hardwired, yet usually involuntary and hidden representational preference towards certain languages.
We show that biased technology is often the result of research and development methodologies that do not do justice to the complexity of the languages being represented.
We present a new initiative that aims at reducing linguistic bias through both technological design and methodology.
arXiv Detail & Related papers (2023-07-25T10:53:20Z) - From Word Models to World Models: Translating from Natural Language to
the Probabilistic Language of Thought [124.40905824051079]
We propose rational meaning construction, a computational framework for language-informed thinking.
We frame linguistic meaning as a context-sensitive mapping from natural language into a probabilistic language of thought.
We show that LLMs can generate context-sensitive translations that capture pragmatically-appropriate linguistic meanings.
We extend our framework to integrate cognitively-motivated symbolic modules.
arXiv Detail & Related papers (2023-06-22T05:14:00Z) - Language Models as Inductive Reasoners [125.99461874008703]
We propose a new paradigm (task) for inductive reasoning, which is to induce natural language rules from natural language facts.
We create a dataset termed DEER containing 1.2k rule-fact pairs for the task, where rules and facts are written in natural language.
We provide the first and comprehensive analysis of how well pretrained language models can induce natural language rules from natural language facts.
arXiv Detail & Related papers (2022-12-21T11:12:14Z) - CLSE: Corpus of Linguistically Significant Entities [58.29901964387952]
We release a Corpus of Linguistically Significant Entities (CLSE) annotated by experts.
CLSE covers 74 different semantic types to support various applications from airline ticketing to video games.
We create a linguistically representative NLG evaluation benchmark in three languages: French, Marathi, and Russian.
arXiv Detail & Related papers (2022-11-04T12:56:12Z) - Transparency Helps Reveal When Language Models Learn Meaning [71.96920839263457]
Our systematic experiments with synthetic data reveal that, with languages where all expressions have context-independent denotations, both autoregressive and masked language models learn to emulate semantic relations between expressions.
Turning to natural language, our experiments with a specific phenomenon -- referential opacity -- add to the growing body of evidence that current language models do not well-represent natural language semantics.
arXiv Detail & Related papers (2022-10-14T02:35:19Z) - Norm Participation Grounds Language [16.726800816202033]
I propose a different, and more wide-ranging conception of how grounding should be understood: What grounds language is its normative nature.
There are standards for doing things right, these standards are public and authoritative, while at the same time acceptance of authority can be disputed and negotiated.
What grounds language, then, is the determined use that language users make of it, and what it is grounded in is the community of language users.
arXiv Detail & Related papers (2022-06-06T20:21:59Z) - Towards Zero-shot Language Modeling [90.80124496312274]
We construct a neural model that is inductively biased towards learning human languages.
We infer this distribution from a sample of typologically diverse training languages.
We harness additional language-specific side information as distant supervision for held-out languages.
arXiv Detail & Related papers (2021-08-06T23:49:18Z) - ReferentialGym: A Nomenclature and Framework for Language Emergence &
Grounding in (Visual) Referential Games [0.30458514384586394]
Natural languages are powerful tools wielded by human beings to communicate information and co-operate towards common goals.
computational linguists have been researching the emergence of in artificial languages induced by language games.
The AI community has started to investigate language emergence and grounding working towards better human-machine interfaces.
arXiv Detail & Related papers (2020-12-17T10:22:15Z) - Givenness Hierarchy Theoretic Cognitive Status Filtering [1.689482889925796]
Humans use pronouns due to implicit assumptions about the cognitive statuses their referents have in the minds of their conversational partners.
We present two models of cognitive status: a rule-based Finite State Machine model and a Cognitive Status Filter.
The models are demonstrated and evaluated using a silver-standard English subset of the OFAI Multimodal Task Description Corpus.
arXiv Detail & Related papers (2020-05-22T16:44:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.