Fugu-MT 論文翻訳(概要): "Mirror" Language AI Models of Depression are Criterion-Contaminated

論文の概要: "Mirror" Language AI Models of Depression are Criterion-Contaminated

arxiv url: http://arxiv.org/abs/2508.05830v1
Date: Thu, 07 Aug 2025 20:13:00 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-11 20:39:06.000381
Title: "Mirror" Language AI Models of Depression are Criterion-Contaminated
Title（参考訳）: 誤り」言語AIモデルが基準に適合
Authors: Tong Li, Rasiq Hussain, Mehak Gupta, Joshua R. Oltmanns,
Abstract要約: ミスモデル」は、予測スコアが予測者自身に依存する場合に「基準汚染」に悩まされる。うつ病のミラー言語AIモデルは、人工的に膨らませられた効果の大きさと一般化可能性の低下を示した。
参考スコア（独自算出の注目度）: 2.9853748238660978
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A growing number of studies show near-perfect LLM language-based prediction of depression assessment scores (up to R2 of .70). However, many develop these models directly from language responses to depression assessments. These "Mirror models" suffer from "criterion contamination", which arises when a predicted score depends in part on the predictors themselves. This causes artificial effect size inflation which reduces model generalizability. The present study compares the performance of Mirror models versus "Non-Mirror models", which are developed from language that does not mirror the assessment they are developed to predict. N = 110 research participants completed two different interviews: structured diagnostic and life history interviews. GPT-4, GPT-4o and LLaMA3-70B were then prompted to predict structured diagnostic interview depression scores from the two transcripts separately. Mirror models (using structured diagnostic data) showed very large effect sizes (e.g., R2 = .80). As expected, NonMirror models (using life history data) demonstrated smaller effect sizes, but were relatively large (e.g., R2 = .27). When Mirror and Non-Mirror model-predicted structured interview depression scores were correlated with self-reported depression symptoms, Mirror and NonMirror performed the same (e.g., r = ~.54), indicating that Mirror models contain bias perhaps due to criterion contamination. Topic modeling identified clusters across Mirror and Non-Mirror models, as well as between true-positive and false-positive predictions. In this head-to-head comparison study, Mirror language AI models of depression showed artificially inflated effect sizes and less generalizability. As language AI models for depression continue to evolve, incorporating Non-Mirror models may identify interpretable, and generalizable semantic features that have unique utility in real-world psychological assessment.
Abstract（参考訳）: LLM言語に基づく抑うつ評価スコアの予測(.70のR2まで)がほぼ完璧に行われている研究が増えている。しかし、多くの人はこれらのモデルを言語反応から抑うつ評価まで直接開発している。これらの「ミラーモデル」は「基準汚染」に悩まされ、予測されたスコアが予測者自身に依存するときに生じる。これにより、モデル一般化性を低減する人工的な効果サイズインフレーションが生じる。本研究では,ミラーモデルと非ミラーモデルの比較を行った。 N = 110の研究参加者は、構造化診断とライフヒストリーの2つの異なるインタビューを完了した。その後,GPT-4,GPT-4oおよびLLaMA3-70Bは,2つの転写産物から構成された診断面接抑うつスコアを別々に予測するよう促された。ミラーモデルは(構造化診断データを用いて)非常に大きな効果(例えば、R2 = .80)を示した。予想通り、NonMirrorモデル(ライフヒストリーデータを用いた)はより小さい効果を示したが、相対的に大きい(例: R2 = .27)。 MirrorモデルとNon-Mirrorモデルによる抑うつスコアが自己報告型うつ症状と相関すると、MirrorとNonMirrorは同じ結果(eg , r = ~.54)を行い、Mirrorモデルがおそらく基準汚染によるバイアスを含むことを示した。トピックモデリングは、ミラーモデルと非ミラーモデル、および真陽性と偽陽性の予測の間のクラスタを特定した。この頭と頭の比較研究では、うつ病のミラー言語AIモデルは人工的に膨らませられた効果の大きさと一般化性の低下を示した。抑うつのための言語AIモデルは進化し続けており、非ミラーモデルを導入することで、現実世界の心理学的評価においてユニークな有用性を持つ解釈可能で一般化可能な意味的特徴を識別することができる。

関連論文リスト

DepressLLM: Interpretable domain-adapted language model for depression detection from real-world narratives [6.1211540596331755]
本研究では,幸福と苦悩の両方を反映した3,699個の自伝的物語のコーパスをトレーニングし,評価したDepressLLMを紹介する。 DepressLLMは解釈可能な抑うつ予測を提供し、Score-guided Token Probability Summation (SToPS)モジュールを通じて、改善された分類性能と信頼性の高い信頼推定を提供する。
論文参考訳（メタデータ） (2025-08-12T03:12:55Z)
On the Validity of Head Motion Patterns as Generalisable Depression Biomarkers [5.251042759836316]
本研究は,うつ病重症度推定のための基本頭部運動単位を用いたモデルの有効性と一般性について検討する。異なる西欧文化からの3つの抑うつデータセットを考察し, キネムパターンの一般化可能性について検討した。 1) 頭部運動パターンは, 抑うつの重症度を推定するための効果的なバイオマーカーであり, 分類と回帰の両タスクにおいて高い競争力を発揮する。
論文参考訳（メタデータ） (2025-05-29T13:22:30Z)
Bigger But Not Better: Small Neural Language Models Outperform Large Language Models in Detection of Thought Disorder [7.585589727435719]
より小さなニューラルネットワークモデルが正の形式的思考障害の検出に有効な選択肢となるかどうかを検討する。意外なことに,本研究の結果は,より小さなモデルの方が,形式的思考障害に関連する言語的差異に敏感であることが示唆された。
論文参考訳（メタデータ） (2025-03-25T22:55:58Z)
LlaMADRS: Prompting Large Language Models for Interview-Based Depression Assessment [75.44934940580112]
LlaMADRSは、オープンソースのLarge Language Models(LLM)を利用して、うつ病の重症度評価を自動化する新しいフレームワークである。本研究は,クリニカルインタヴューの解釈・スコアリングにおけるモデル指導のために,慎重に設計された手がかりを用いたゼロショットプロンプト戦略を用いている。実世界における236件のインタビューを対象とし,臨床評価と強い相関性を示した。
論文参考訳（メタデータ） (2025-01-07T08:49:04Z)
Robust Speech and Natural Language Processing Models for Depression Screening [0.0]
うつ病は世界的な健康上の問題であり、患者スクリーニングの強化が不可欠である。この目的のために開発された2つのディープラーニングモデルについて述べる。 1つのモデルは音響に基づいており、もう1つは自然言語処理に基づいている。
論文参考訳（メタデータ） (2024-12-26T06:05:52Z)
Generative causal testing to bridge data-driven models and scientific theories in language neuroscience [82.995061475971]
脳における言語選択性の簡潔な説明を生成するためのフレームワークである生成因果テスト(GCT)を提案する。 GCTは機能的選択性に類似した脳領域の細粒度の違いを識別できることを示す。
論文参考訳（メタデータ） (2024-10-01T15:57:48Z)
Evaluating Model Bias Requires Characterizing its Mistakes [19.777130236160712]
スキューサイズ(SkewSize)は、モデルの予測における誤りからバイアスを捉える、原則付きフレキシブルなメトリクスである。マルチクラスの設定で使用したり、生成モデルのオープンな語彙設定に一般化することができる。合成データで訓練された標準的な視覚モデル、ImageNetで訓練された視覚モデル、BLIP-2ファミリーの大規模視覚言語モデルなどである。
論文参考訳（メタデータ） (2024-07-15T11:46:21Z)
LLM Questionnaire Completion for Automatic Psychiatric Assessment [49.1574468325115]
大規模言語モデル(LLM)を用いて、非構造的心理面接を、様々な精神科領域と人格領域にまたがる構造化された質問票に変換する。得られた回答は、うつ病の標準化された精神医学的指標(PHQ-8)とPTSD(PCL-C)の予測に使用される特徴として符号化される。
論文参考訳（メタデータ） (2024-06-09T09:03:11Z)
Development and Validation of a Deep-Learning Model for Differential Treatment Benefit Prediction for Adults with Major Depressive Disorder Deployed in the Artificial Intelligence in Depression Medication Enhancement (AIDME) Study [0.622895724042048]
大うつ病(MDD)の薬理学的治療は、試行錯誤のアプローチに依存している。治療結果のパーソナライズを目的とした人工知能(AI)モデルを導入する。
論文参考訳（メタデータ） (2024-06-07T15:04:59Z)
Using Pre-training and Interaction Modeling for ancestry-specific disease prediction in UK Biobank [69.90493129893112]
近年のゲノムワイド・アソシエーション(GWAS)研究は、複雑な形質の遺伝的基盤を明らかにしているが、非ヨーロッパ系個体の低発現を示している。そこで本研究では,マルチオミクスデータを用いて,多様な祖先間での疾患予測を改善することができるかを評価する。
論文参考訳（メタデータ） (2024-04-26T16:39:50Z)
Assessing ML Classification Algorithms and NLP Techniques for Depression Detection: An Experimental Case Study [0.6524460254566905]
うつ病は世界中で何百万人もの人々に影響を与えており、最も一般的な精神疾患の1つとなっている。近年の研究では、機械学習(ML)と自然言語処理(NLP)のツールや技術がうつ病の診断に広く用いられていることが証明されている。しかし, 外傷後ストレス障害 (PTSD) などの他の症状が存在するうつ病検出アプローチの評価には, 依然としていくつかの課題がある。
論文参考訳（メタデータ） (2024-04-03T19:45:40Z)
The Relationship Between Speech Features Changes When You Get Depressed: Feature Correlations for Improving Speed and Performance of Depression Detection [69.88072583383085]
この研究は、抑うつが音声から抽出した特徴間の相関を変化させることを示す。このような洞察を用いることで、SVMとLSTMに基づく抑うつ検出器のトレーニング速度と性能を向上させることができる。
論文参考訳（メタデータ） (2023-07-06T09:54:35Z)
Bayesian Networks for the robust and unbiased prediction of depression and its symptoms utilizing speech and multimodal data [65.28160163774274]
我々は,抑うつ,抑うつ症状,および,胸腺で収集された音声,表情,認知ゲームデータから得られる特徴の関連性を把握するためにベイズ的枠組みを適用した。
論文参考訳（メタデータ） (2022-11-09T14:48:13Z)
Naturalistic Causal Probing for Morpho-Syntax [76.83735391276547]
スペインにおける実世界のデータに対する入力レベルの介入に対する自然主義的戦略を提案する。提案手法を用いて,共同設立者から文章中の形態・症状の特徴を抽出する。本研究では,事前学習したモデルから抽出した文脈化表現に対する性別と数字の因果効果を解析するために,本手法を適用した。
論文参考訳（メタデータ） (2022-05-14T11:47:58Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。