Fugu-MT 論文翻訳(概要): Gender-Dependent Diagnostic Substitution in LLM Medical Triage: Same Symptoms, Unequal Urgency

論文の概要: Gender-Dependent Diagnostic Substitution in LLM Medical Triage: Same Symptoms, Unequal Urgency

arxiv url: http://arxiv.org/abs/2606.03641v1
Date: Tue, 02 Jun 2026 13:35:12 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-03 22:00:05.03075
Title: Gender-Dependent Diagnostic Substitution in LLM Medical Triage: Same Symptoms, Unequal Urgency
Title（参考訳）: LLM医学トライアージにおける性依存的診断置換 : 同じ症状, 緊急性
Authors: Qi Han Wong,
Abstract要約: 症例の性別と年齢が異なる場合, 大規模言語モデルが同一の神経症状に対して, 異なる医用トリアージを推奨するか否かを検討する。本研究は,7つの人口動態条件にまたがって標準化された症状プロファイルを示す。ジェンダーに依存した男女差がある。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We investigate whether large language models produce different medical triage recommendations for identical neurological symptoms when only the patient's stated gender and age vary. Using three model families--Gemini 3.5 Flash, Claude Sonnet 4.6, and GPT-5.4-mini--we present a standardized symptom profile (persistent headache, blurred vision, morning nausea, visual disturbances) across seven demographic conditions: three age groups (25, 38, 65) x two genders (male, female), plus a gender-unspecified baseline (n = 30 per condition per model, 630 total trials). We find a stark, systemic gender-dependent triage disparity: young women receive significantly lower emergency room (ER) referral rates than age-matched men (Gemini: 0% vs. 23.3%; Claude: 6.7% vs. 96.7%; GPT: 6.7% vs. 66.7%, all p < 0.001). The disparity disappears at age 65 for all models. The primary mechanism is diagnostic substitution: the models anchor on a gender-associated diagnosis, preferentially classifying young women with Idiopathic Intracranial Hypertension (IIH)--a condition epidemiologically linked to women of childbearing age--while diagnosing men with generic increased intracranial pressure with space-occupying lesions in the differential. This diagnostic closure routes female patients to lower-urgency care (outpatient doctor appointments) despite comparable severity ratings (7-9/10). Our findings demonstrate that clinical LLMs replicate documented human clinical biases by using epidemiological priors to suppress triage urgency, suggesting that AI triage engines must decouple urgency assessment from probabilistic diagnostic priors. We release all code, prompts, and raw results.
Abstract（参考訳）: 症例の性別と年齢が異なる場合, 大規模言語モデルが同一の神経症状に対して異なる医用トリアージを推奨するか否かを検討する。 Genini 3.5 Flash、Claude Sonnet 4.6、GPT-5.4-mini-の3つのモデルファミリを用いて、3つの年齢グループ (25, 38, 65) x 2 の性別 (男性, 女性) と、性別不明のベースライン (モデル毎の状態n = 30, 630, 630) という、標準的な症状プロファイル(頭痛、ぼんやりした視力、朝の吐き気、視覚障害)を提示した。若い女性は年齢適合した男性よりも救急室(ER)の紹介率が著しく低い(ジェニーニ:0%対23.3%、クロード:6.7%対96.7%、GPT:6.7%対66.7%、全p < 0.001)。差は全モデルで65歳で消える。第一のメカニズムは、性別関連診断に頼り、特発性頭蓋内高血圧症(IIH)の若い女性を優先的に分類するモデルである。この診断閉鎖は、重度評価(7-9/10)にもかかわらず、女性患者を低緊急ケア(外来医師の診察)に誘導する。以上の結果から,AIトリアージエンジンは,予後診断の先行性から緊急性評価を分離する必要があることが示唆された。すべてのコード、プロンプト、生の結果をリリースします。

関連論文リスト

EQUITRIAGE: A Fairness Audit of Gender Bias in LLM-Based Emergency Department Triage [0.0]
本稿では,救急部門における大規模言語モデル(LLM)の公平性監査であるEQUITRIAGEについて紹介する。 5つのモデルは全て、事前登録された5%の閾値を超えるフリップレートを生み出した。グループパリティ、反事実的不変性、性別のキャリブレーションは、異なる公平性の性質である。
論文参考訳（メタデータ） (2026-05-05T17:20:55Z)
PsychBench: Auditing Epidemiological Fidelity in Large Language Model Mental Health Simulations [0.0]
大きな言語モデルは、臨床訓練、研究、メンタルヘルスツールのために患者をシミュレートするためにますます多くデプロイされている。 LLM患者シミュレーションの最初の疫学的検査である PsychBench を紹介した。モデルでは, 抽出した個体群を誤って表現しながら, 臨床的に有意な個体を生成できることが示唆された。
論文参考訳（メタデータ） (2026-04-19T10:05:25Z)
PanCanBench: A Comprehensive Benchmark for Evaluating Large Language Models in Pancreatic Oncology [48.732366302949515]
大規模言語モデル(LLM)は、標準化された検査において専門家レベルの性能を達成したが、複数の選択精度は現実の臨床的有用性や安全性を十分に反映していない。我々は、未確認患者の質問に対して、専門家のルーブリックを作成するための、ループ内人間パイプラインを開発した。 LLM-as-a-judge フレームワークを用いて,22のプロプライエタリおよびオープンソース LLM の評価を行い,臨床完全性,事実精度,Web-search 統合について検討した。
論文参考訳（メタデータ） (2026-03-02T00:50:39Z)
Evaluating the Presence of Sex Bias in Clinical Reasoning by Large Language Models [0.5872014229110214]
大規模言語モデル(LLM)は、ドキュメント、教育、臨床決定支援のための医療にますます組み込まれている。本研究では,現代LPMが臨床推論における性差を示し,モデル構成がこれらの行動にどのように影響するかを検討した。
論文参考訳（メタデータ） (2026-02-04T10:21:38Z)
Epistemic-aware Vision-Language Foundation Model for Fetal Ultrasound Interpretation [83.02147613524032]
医療用AIシステムFetalMindについて報告する。本稿では、専門家による2部グラフをモデルに注入し、ビュー・ディスリーズ関連を分離するSED(Salient Epistemic Disentanglement)を提案する。 FetalMindはすべての妊娠期のオープンソースおよびクローズドソースベースラインを上回り、平均利得は+14%、臨界条件では+61.2%高い。
論文参考訳（メタデータ） (2025-10-14T19:57:03Z)
Medical Hallucinations in Foundation Models and Their Impact on Healthcare [71.15392179084428]
基礎モデルの幻覚は自己回帰訓練の目的から生じる。トップパフォーマンスモデルは、チェーン・オブ・シークレット・プロンプトで強化された場合、97%の精度を達成した。
論文参考訳（メタデータ） (2025-02-26T02:30:44Z)
Sex-based Disparities in Brain Aging: A Focus on Parkinson's Disease [2.1506382989223782]
過去の研究にもかかわらず、PD患者の脳老化過程における性機能を理解するには大きなギャップが残っている。 T1-weighted MRI-driven brain-predicted age difference was calculated in a group of 373 PD patients from the PPMI database。脳PADは, 一般認知の低下, 睡眠行動障害の悪化, 視機能低下, 気道萎縮との関連が認められた。
論文参考訳（メタデータ） (2023-09-18T18:35:54Z)
Prostate Age Gap (PAG): An MRI surrogate marker of aging for prostate cancer detection [0.15518894748362708]
前立腺年齢ギャップ(PAG)は臨床的に有意なPC(csPC)のリスクに大きく関連し、他の確立したPCリスク因子よりも優れていた。
論文参考訳（メタデータ） (2023-08-10T05:20:25Z)
Generative models improve fairness of medical classifiers under distribution shifts [49.10233060774818]
データから現実的な拡張を自動的に学習することは、生成モデルを用いてラベル効率の良い方法で可能であることを示す。これらの学習の強化は、モデルをより堅牢で統計的に公平に配布できることを示した。
論文参考訳（メタデータ） (2023-04-18T18:15:38Z)
IA-GCN: Interpretable Attention based Graph Convolutional Network for Disease prediction [47.999621481852266]
タスクに対する入力特徴の臨床的関連性を解釈する,解釈可能なグラフ学習モデルを提案する。臨床シナリオでは、そのようなモデルは、臨床専門家が診断および治療計画のためのより良い意思決定を支援することができる。本研究では,Tadpoleの平均精度が3.2%,UKBBジェンダーが1.6%,UKBB年齢予測タスクが2%と,比較方法と比較して優れた性能を示した。
論文参考訳（メタデータ） (2021-03-29T13:04:02Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。