Fugu-MT 論文翻訳(概要): DeepSeek performs better than other Large Language Models in Dental Cases

論文の概要: DeepSeek performs better than other Large Language Models in Dental Cases

arxiv url: http://arxiv.org/abs/2509.02036v1
Date: Tue, 02 Sep 2025 07:26:20 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-04 15:17:03.942852
Title: DeepSeek performs better than other Large Language Models in Dental Cases
Title（参考訳）: DeepSeekは歯科における他の大規模言語モデルよりも優れている
Authors: Hexian Zhang, Xinyu Yan, Yanqi Yang, Lijian Jin, Ping Yang, Junwen Wang,
Abstract要約: 大規模言語モデル (LLM) は医療において変革の可能性を秘めているが, 縦断的な患者の物語を解釈する能力はいまだ不十分である。本研究は, 長手型歯科症例のヴィグネット解析能力について, 最先端の4つのLSMについて検討した。 DeepSeekがトップパフォーマーとして登場し、優れた忠実さと高い専門家評価を示した。
参考スコア（独自算出の注目度）: 3.7838709303967293
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Large language models (LLMs) hold transformative potential in healthcare, yet their capacity to interpret longitudinal patient narratives remains inadequately explored. Dentistry, with its rich repository of structured clinical data, presents a unique opportunity to rigorously assess LLMs' reasoning abilities. While several commercial LLMs already exist, DeepSeek, a model that gained significant attention earlier this year, has also joined the competition. This study evaluated four state-of-the-art LLMs (GPT-4o, Gemini 2.0 Flash, Copilot, and DeepSeek V3) on their ability to analyze longitudinal dental case vignettes through open-ended clinical tasks. Using 34 standardized longitudinal periodontal cases (comprising 258 question-answer pairs), we assessed model performance via automated metrics and blinded evaluations by licensed dentists. DeepSeek emerged as the top performer, demonstrating superior faithfulness (median score = 0.528 vs. 0.367-0.457) and higher expert ratings (median = 4.5/5 vs. 4.0/5), without significantly compromising readability. Our study positions DeepSeek as the leading LLM for case analysis, endorses its integration as an adjunct tool in both medical education and research, and highlights its potential as a domain-specific agent.
Abstract（参考訳）: 大規模言語モデル (LLM) は医療において変革の可能性を秘めているが, 縦断的な患者の物語を解釈する能力はいまだ不十分である。歯科は、構造化された臨床データの豊富なリポジトリを持ち、LSMの推論能力を厳格に評価するユニークな機会を提供する。すでにいくつかの商用LCMが存在しているが、今年初めに大きな注目を集めたDeepSeekも参加している。本研究は,4つの最先端LCM (GPT-4o, Gemini 2.0 Flash, Copilot, DeepSeek V3) について, 歯科症例の経時的ヴィグネット解析能力について検討した。標準縦断歯周症例34例(質問応答ペア258例)を用いて, 自動測定, 盲点評価を行った。 DeepSeekがトップパフォーマーとして登場し、優れた忠実さ(中間スコア=0.528対0.367-0.457)と高い専門家評価(中間スコア=4.5/5対.4.0/5)を示した。本研究は、DeepSeekをケース分析の先駆的LLMとして位置づけ、医学教育と研究の両方において補助的ツールとしての統合を支持し、ドメイン固有のエージェントとしての可能性を強調した。

論文の概要: DeepSeek performs better than other Large Language Models in Dental Cases

関連論文リスト