Fugu-MT 論文翻訳(概要): BIT.UA-AAUBS at ArchEHR-QA 2026: Evaluating Open-Source and Proprietary LLMs via Prompting in Low-Resource QA

論文の概要: BIT.UA-AAUBS at ArchEHR-QA 2026: Evaluating Open-Source and Proprietary LLMs via Prompting in Low-Resource QA

arxiv url: http://arxiv.org/abs/2605.03618v1
Date: Tue, 05 May 2026 10:43:56 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-06 19:35:43.899737
Title: BIT.UA-AAUBS at ArchEHR-QA 2026: Evaluating Open-Source and Proprietary LLMs via Prompting in Low-Resource QA
Title（参考訳）: BIT.UA-AAUBS at ArchEHR-QA 2026: Proprietary LLMs by Prompting in Low-Resource QA
Authors: Richard A. A. Jonker, Alexander Christiansen, Alexandros Maniatis, Rúben Garrido, Rogério Braunschweiger de Freitas Lima, Roman Jurowetzki, Sérgio Matos,
Abstract要約: 本稿では,ArchEHR-QA 2026共有タスクにおけるBIT.UAグループとAAUBSグループの共同参加について述べる。トレーニングデータがないことと、医療領域に固有の厳格なデータプライバシー制約があるため、重み付けをせずにLLM(Large Language Models)の能力を調査する。我々は、いくつかの最先端のプロプライエタリモデルと、様々な迅速なエンジニアリング戦略を用いて、ローカルにデプロイ可能なオープンソース代替品を評価した。我々の結果は、プロプライエタリなモデルは変化を促す強力なレジリエンスを示す一方で、ドメイン適応型オープンソースモデル(MedGemma 3 27Bなど)は高い競争力を発揮することを示した。
参考スコア（独自算出の注目度）: 65.22695574492265
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper presents the joint participation of the BIT.UA and AAUBS groups in the ArchEHR-QA 2026 shared task, which focuses on clinical question answering and evidence grounding in a low-resource setting. Due to the absence of training data and the strict data privacy constraints inherent to the healthcare domain (e.g. GDPR), we investigate the capabilities of Large Language Models (LLMs) without weight updates. We evaluate several state-of-the-art proprietary models and locally deployable open-source alternatives using various prompt engineering strategies, including task decomposition, Chain-of-Thought, and in-context learning. Furthermore, we explore majority voting and LLM-as-a-judge ensembling techniques to maximize predictive robustness. Our results demonstrate that while proprietary models exhibit strong resilience to prompt variations, domain-adapted open-source models (such as MedGemma 3 27B) achieve highly competitive performance when paired with the right prompt. Overall, our prompt-based approach proved highly effective, securing 1st place in Subtask 4 (evidence citation alignment) and 3rd place in Subtask 3 (patient-friendly answer generation). All code, results, and prompts are available on our GitHub repository: https://github.com/bioinformatics-ua/ArchEHR-QA-2026.
Abstract（参考訳）: 本稿では,BIT.UA群とAAUBS群が共同で参加するArchEHR-QA 2026課題について述べる。トレーニングデータがないことと、医療領域固有の厳格なデータプライバシー制約(GDPRなど)のため、重み付けなしでLLM(Large Language Models)の能力を調査する。我々は、タスク分解、Chain-of-Thought、コンテキスト内学習など、様々な迅速なエンジニアリング戦略を用いて、最先端のプロプライエタリモデルと、ローカルにデプロイ可能なオープンソース代替品を評価した。さらに,予測ロバスト性を最大化するために,多数決とLCM-as-a-judgeアンサンブル手法について検討する。この結果から,プロプライエタリなモデルでは変化の促進に強いレジリエンスを示す一方で,ドメイン適応型オープンソースモデル(MedGemma 3 27B など)は適切なプロンプトと組み合わせて高い競争性能を発揮することが示された。提案手法は,Subtask 4では第1位,Subtask 3では第3位,患者に優しい回答生成では第3位であった。すべてのコード、結果、プロンプトはGitHubリポジトリで利用可能です。

論文の概要: BIT.UA-AAUBS at ArchEHR-QA 2026: Evaluating Open-Source and Proprietary LLMs via Prompting in Low-Resource QA

関連論文リスト