Fugu-MT 論文翻訳(概要): Are you talking to ['xem'] or ['x', 'em']? On Tokenization and Addressing Misgendering in LLMs with Pronoun Tokenization Parity

論文の概要: Are you talking to ['xem'] or ['x', 'em']? On Tokenization and Addressing Misgendering in LLMs with Pronoun Tokenization Parity

arxiv url: http://arxiv.org/abs/2312.11779v2
Date: Thu, 21 Dec 2023 11:45:55 GMT
ステータス: 翻訳完了
システム内更新日: 2023-12-22 17:35:30.944966
Title: Are you talking to ['xem'] or ['x', 'em']? On Tokenization and Addressing Misgendering in LLMs with Pronoun Tokenization Parity
Title（参考訳）: あなたは ['xem'] か ['x', 'em'] と話していますか。固有化パリティを持つLLMにおけるトークン化と対処ミス
Authors: Anaelia Ovalle, Ninareh Mehrabi, Palash Goyal, Jwala Dhamala, Kai-Wei Chang, Richard Zemel, Aram Galstyan, Rahul Gupta
Abstract要約: 代名詞のトークン化パリティ (PTP) は, トークンの機能的構造を保ち, 新生代名詞の誤認を減らすための新しいアプローチである。代名詞の整合性に基づく尺度と新しい構文に基づく尺度を用いて,PTPの有効性を評価する。
参考スコア（独自算出の注目度）: 79.41081292703352
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A large body of NLP research has documented the ways gender biases manifest and amplify within large language models (LLMs), though this research has predominantly operated within a gender binary-centric context. A growing body of work has identified the harmful limitations of this gender-exclusive framing; many LLMs cannot correctly and consistently refer to persons outside the gender binary, especially if they use neopronouns. While data scarcity has been identified as a possible culprit, the precise mechanisms through which it influences LLM misgendering remain underexplored. Our work addresses this gap by studying data scarcity's role in subword tokenization and, consequently, the formation of LLM word representations. We uncover how the Byte-Pair Encoding (BPE) tokenizer, a backbone for many popular LLMs, contributes to neopronoun misgendering through out-of-vocabulary behavior. We introduce pronoun tokenization parity (PTP), a novel approach to reduce LLM neopronoun misgendering by preserving a token's functional structure. We evaluate PTP's efficacy using pronoun consistency-based metrics and a novel syntax-based metric. Through several controlled experiments, finetuning LLMs with PTP improves neopronoun consistency from 14.5% to 58.4%, highlighting the significant role tokenization plays in LLM pronoun consistency.
Abstract（参考訳）: 多くのnlp研究は、大規模な言語モデル(llm)の中でジェンダーバイアスが顕在化し、増幅する方法を文書化しているが、この研究は主にジェンダーのバイナリ中心の文脈で行われている。多くのLDMは、特に新名詞を使用する場合、性別のバイナリ以外の人について正しく一貫して言及することはできない。データ不足が原因として特定されているが、LSMの誤認に影響を及ぼす正確なメカニズムは未解明のままである。我々の研究は、サブワードトークン化におけるデータ不足の役割を研究した結果、LLMワード表現の形成によって、このギャップに対処する。 Byte-Pair Encoding (BPE) トークンライザは,多くのLLMのバックボーンであり,語彙外動作によるニュープロノウン誤認識にどのように貢献するかを明らかにする。代名詞トークン化パリティ (PTP) は, トークンの機能的構造を保ち, LLMネオプロノウン誤認を減らすための新しいアプローチである。代名詞整合性に基づく尺度と新しい構文に基づく尺度を用いて,PTPの有効性を評価する。いくつかの制御された実験を通じて、LPMをPTPで微調整することで、新生ニューロンの一貫性が14.5%から58.4%に向上し、LLM代名詞の一貫性において重要な役割を担っている。

関連論文リスト

Do They Understand Them? An Updated Evaluation on Nonbinary Pronoun Handling in Large Language Models [13.89598383847666]
大規模言語モデル(LLM)は、公正さと傾きが重要となるセンシティブな状況にますますデプロイされている。代名詞の使用、特にジェンダーニュートラルやネオ代名詞は、AIの責任を負う上で重要な課題である。 LLMの代名詞忠実度を評価するための拡張および更新されたベンチマークであるMISGENDERED+を紹介する。
論文参考訳（メタデータ） (2025-08-01T17:11:42Z)
Breaking Language Barriers or Reinforcing Bias? A Study of Gender and Racial Disparities in Multilingual Contrastive Vision Language Models [28.944990804599893]
M-CLIP, NLLB-CLIP, CAPIVARA-CLIP, and the debiased SigLIP-2。人種と性別の偏見を定量化し、ステレオタイプ増幅を測定する。
論文参考訳（メタデータ） (2025-05-20T10:14:00Z)
A Multilingual, Culture-First Approach to Addressing Misgendering in LLM Applications [12.5856659067182]
ミスジェンダー(英: missgendering)とは、性別によって、選択したアイデンティティと一致しない人を指す行為である。英語に基づくアプローチは、その代名詞の使用など、誤解を避けるための明確なアプローチを持つ
論文参考訳（メタデータ） (2025-03-26T08:01:35Z)
Beyond Binary Gender: Evaluating Gender-Inclusive Machine Translation with Ambiguous Attitude Words [85.48043537327258]
既存の機械翻訳の性別バイアス評価は主に男性と女性の性別に焦点を当てている。本研究では,AmbGIMT (Gender-Inclusive Machine Translation with Ambiguous attitude words) のベンチマークを示す。本研究では,感情的態度スコア(EAS)に基づく性別バイアス評価手法を提案する。
論文参考訳（メタデータ） (2024-07-23T08:13:51Z)
From 'Showgirls' to 'Performers': Fine-tuning with Gender-inclusive Language for Bias Reduction in LLMs [1.1049608786515839]
我々は、ジェンダー・インクリシティを促進するために、大規模言語モデル内の言語構造に適応する。私たちの作品の焦点は英語の「In'show-Girl'」や「man-cave」のような男女排他的な接尾辞である。
論文参考訳（メタデータ） (2024-07-05T11:31:30Z)
Why Not Transform Chat Large Language Models to Non-English? [57.16587777261422]
非英語データの不足は、非英語大言語モデル(LLM)の開発を制限する TransLLMは、転送問題を変換チェーン・オブ・シント(translation chain of-of- Thought)でいくつかの一般的なサブタスクに分割する。本手法は,シングルターンデータのみを用いて,マルチターンベンチマークMT-benchにおいて,強いベースラインとChatGPTより優れる。
論文参考訳（メタデータ） (2024-05-22T18:53:25Z)
Transforming Dutch: Debiasing Dutch Coreference Resolution Systems for Non-binary Pronouns [5.5514102920271196]
ジェンダーニュートラルな代名詞は、西欧語で導入されつつある。最近の評価では、英語のNLPシステムはジェンダーニュートラル代名詞を正しく処理できないことが示されている。本稿では,オランダ語の男女中性代名詞に対する基準分解システムの性能について検討する。
論文参考訳（メタデータ） (2024-04-30T18:31:19Z)
Evaluating Gender Bias in Large Language Models via Chain-of-Thought Prompting [87.30837365008931]
CoT(Chain-of-Thought)プロンプトを備えた大規模言語モデル(LLM)は、計算不能なタスクでも正確なインクリメンタルな予測を行うことができる。本研究では,LLMのステップバイステップ予測が性差に及ぼす影響について検討した。
論文参考訳（メタデータ） (2024-01-28T06:50:10Z)
MISGENDERED: Limits of Large Language Models in Understanding Pronouns [46.276320374441056]
我々は、英語のジェンダーニュートラル代名詞を正しく活用する能力について、人気言語モデルの評価を行った。提案するMISGENDEREDは,大言語モデルが好む代名詞を正しく活用する能力を評価するためのフレームワークである。
論文参考訳（メタデータ） (2023-06-06T18:27:52Z)
"I'm fully who I am": Towards Centering Transgender and Non-Binary Voices to Measure Biases in Open Language Generation [69.25368160338043]
トランスジェンダーとノンバイナリ(TGNB)の個人は、日常生活から差別や排除を不当に経験している。オープン・ランゲージ・ジェネレーションにおいて,経験豊富なTGNB人物の疎外化を取り巻く社会的現実がいかに貢献し,持続するかを評価する。我々はTGNB指向のコミュニティからキュレートされたテンプレートベースの実世界のテキストのデータセットであるTANGOを紹介する。
論文参考訳（メタデータ） (2023-05-17T04:21:45Z)
Welcome to the Modern World of Pronouns: Identity-Inclusive Natural Language Processing beyond Gender [23.92148222207458]
自然言語処理における3人称代名詞問題の概要について概説する。既存および新規なモデリング手法の評価を行う。我々は、より差別のないアプローチが確立されたベンチマークデータに与える影響を定量化する。
論文参考訳（メタデータ） (2022-02-24T06:42:11Z)
First the worst: Finding better gender translations during beam search [19.921216907778447]
文法的ジェンダー翻訳における体系的誤りによるジェンダーバイアスに着目した。ソース文から自動的に得られる性別特徴を用いて,nbestリストのランク付け実験を行った。これらの技術を組み合わせることで、追加のバイリンガルデータや追加のNMTモデルを必要としないWinoMT精度が大幅に向上します。
論文参考訳（メタデータ） (2021-04-15T12:53:30Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。