Fugu-MT 論文翻訳(概要): Leveraging Multimodal Self-Consistency Reasoning in Coding Motivational Interviewing for Alcohol Use Reduction

論文の概要: Leveraging Multimodal Self-Consistency Reasoning in Coding Motivational Interviewing for Alcohol Use Reduction

arxiv url: http://arxiv.org/abs/2605.12987v2
Date: Sat, 16 May 2026 04:00:00 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-19 17:57:45.802204
Title: Leveraging Multimodal Self-Consistency Reasoning in Coding Motivational Interviewing for Alcohol Use Reduction
Title（参考訳）: アルコール使用削減のための符号化モチベーション面接におけるマルチモーダル自己整合推論の活用
Authors: Guangzeng Han, James G. Murphy, Benjamin O. Ladd, Xiaolei Huang, Brian Borsari,
Abstract要約: コーディング面接(MI)セッションは、クライアントの振る舞いを理解し、結果を予測するのに不可欠です。音声モデル(ALM)の最近の進歩は、行動信号をキャプチャしてMIコーディングを自動化する新しい機会を提供する。本研究は,複数の推論軌道からの予測をキャプチャーし,MIの自動符号化手法を開発することを目的とする。
参考スコア（独自算出の注目度）: 2.77200120166253
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: BACKGROUND: Coding Motivational Interviewing (MI) sessions is essential for understanding client behaviors and predicting outcomes, but it requires substantial time and labor from trained MI professionals. Recent advances in audio-language models (ALMs) offer new opportunities to automate MI coding by capturing multimodal behavioral signals. OBJECTIVE: This study aims to develop an automatic MI coding approach based on ALMs that analyzes raw audio input and integrates predictions from multiple reasoning trajectories using self-consistency to improve coding robustness. METHODS: We experimented with five recorded sessions from de-identified MI audio tapes. We deployed ALMs with four complementary analytic prompts to support utterance-level reasoning: analytic prompting for verbal cues, prosody-aware prompting for acoustic cues, evidence-scoring prompting for quantitative hypothesis testing, and comparative prompting for contrastive reasoning. Three stochastic samples were drawn for each prompt, generating 12 independent reasoning trajectories per utterance. Final predictions were determined by majority voting across all trajectories. RESULTS: Performance was evaluated using accuracy, precision, recall, and macro-F1 scores. The proposed multimodal self-consistency approach achieved 52.56% accuracy, 54.03% precision, 47.45% recall, and a macro-F1 score of 46.40%, exceeding baseline methods. Systematic ablation experiments that removed individual modules consistently degraded performance on the primary metrics. CONCLUSIONS: Multimodal self-consistency outperforms single-pass baseline prompting approaches for MI coding. These findings suggest that incorporating both what clients say and how they say it can support more reliable automatic MI coding.
Abstract（参考訳）: BACKGROUND: コーディングモチベーション面接(MI)セッションは、クライアントの振る舞いを理解し、成果を予測するために不可欠ですが、訓練されたMIプロフェッショナルからはかなりの時間と労力が必要です。音声言語モデル(ALM)の最近の進歩は、マルチモーダルな動作信号をキャプチャすることでMI符号化を自動化する新たな機会を提供する。 OBJECTIVE: この研究は、生音声入力を分析し、自己整合性を用いて複数の推論軌道からの予測を統合して、符号化堅牢性を改善するALMに基づくMI自動符号化手法を開発することを目的としている。 Methods: 未同定MIオーディオテープから記録した5つのセッションを実験した。音声レベルの推論を支援するための4つの補完的解析的プロンプトをALMに配置し, 音声的手がかりに対する分析的プロンプト, 音響的手がかりに対する韻律的プロンプト, 定量的仮説テストのためのエビデンス・スコアリング, コントラスト的推論のための比較的プロンプトの4つについて検討した。 3つの確率的サンプルを各プロンプトに描画し、発話毎に12個の独立した推論軌跡を生成した。最終的な予測は、すべての軌道にまたがる多数決によって決定された。 RESULTS: 精度,精度,リコール,マクロF1スコアを用いて評価した。提案されたマルチモーダル自己整合性アプローチは52.56%の精度、54.03%の精度、47.45%のリコール、46.40%のマクロF1スコアがベースライン法を上回った。個々のモジュールを除去するシステム的アブレーション実験は、主要なメトリクスのパフォーマンスを継続的に低下させた。 CONCLUSIONS: マルチモーダルな自己整合性は、MI符号化のアプローチを促す単一パスベースラインよりも優れている。これらの結果は、クライアントが何を言っているか、どのように言っているかの両方を取り入れることで、より信頼性の高いMIコーディングをサポートできることを示唆している。

論文の概要: Leveraging Multimodal Self-Consistency Reasoning in Coding Motivational Interviewing for Alcohol Use Reduction

関連論文リスト