Fugu-MT 論文翻訳(概要): Are Smaller Open-Weight LLMs Closing the Gap to Proprietary Models for Biomedical Question Answering?

論文の概要: Are Smaller Open-Weight LLMs Closing the Gap to Proprietary Models for Biomedical Question Answering?

arxiv url: http://arxiv.org/abs/2509.18843v1
Date: Tue, 23 Sep 2025 09:27:57 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-24 20:41:27.79986
Title: Are Smaller Open-Weight LLMs Closing the Gap to Proprietary Models for Biomedical Question Answering?
Title（参考訳）: バイオメディカル質問応答のための予備モデルへのギャップを埋める小型オープンウェイトLCM
Authors: Damian Stachura, Joanna Konieczna, Artur Nowak,
Abstract要約: 大規模言語モデル(LLM)のオープンウェイトバージョンは急速に進歩しており、DeepSeek-V3のような最先端モデルはプロプライエタリなLLMと互換性がある。この進歩は、小規模なオープンウェイト LLM がより大きなクローズドソースモデルを効果的に置き換えられるかどうかという問題を提起する。本研究では, GPT-4o, GPT-4.1, Claude 3.5 Sonnet, Claude 3.7 Sonnet など,いくつかのオープンウェイトモデルを比較した。
参考スコア（独自算出の注目度）: 0.5692553719616764
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Open-weight versions of large language models (LLMs) are rapidly advancing, with state-of-the-art models like DeepSeek-V3 now performing comparably to proprietary LLMs. This progression raises the question of whether small open-weight LLMs are capable of effectively replacing larger closed-source models. We are particularly interested in the context of biomedical question-answering, a domain we explored by participating in Task 13B Phase B of the BioASQ challenge. In this work, we compare several open-weight models against top-performing systems such as GPT-4o, GPT-4.1, Claude 3.5 Sonnet, and Claude 3.7 Sonnet. To enhance question answering capabilities, we use various techniques including retrieving the most relevant snippets based on embedding distance, in-context learning, and structured outputs. For certain submissions, we utilize ensemble approaches to leverage the diverse outputs generated by different models for exact-answer questions. Our results demonstrate that open-weight LLMs are comparable to proprietary ones. In some instances, open-weight LLMs even surpassed their closed counterparts, particularly when ensembling strategies were applied. All code is publicly available at https://github.com/evidenceprime/BioASQ-13b.
Abstract（参考訳）: 大規模言語モデル(LLM)のオープンウェイトバージョンは急速に進歩しており、DeepSeek-V3のような最先端モデルはプロプライエタリなLLMと互換性がある。この進歩は、小規模なオープンウェイト LLM がより大きなクローズドソースモデルを効果的に置き換えられるかどうかという問題を提起する。我々は特に,BioASQ チャレンジの第13B フェーズB に参加して探究したバイオメディカル質問応答の文脈に関心を抱いている。本研究では, GPT-4o, GPT-4.1, Claude 3.5 Sonnet, Claude 3.7 Sonnet など,いくつかのオープンウェイトモデルを比較した。質問応答能力を向上させるために,埋め込み距離,コンテキスト内学習,構造化出力に基づいて,最も関連性の高いスニペットを検索する。特定の提案に対して、アンサンブルアプローチを用いて、異なるモデルが生成する多様な出力を正確な問合せ問題に活用する。その結果,オープンウェイト LLM はプロプライエタリ LLM と同等であることがわかった。オープンウェイト LLM は、特にアンサンブル戦略を適用したときに、クローズドな LLM を超越した例もある。すべてのコードはhttps://github.com/evidenceprime/BioASQ-13bで公開されている。

論文の概要: Are Smaller Open-Weight LLMs Closing the Gap to Proprietary Models for Biomedical Question Answering?

関連論文リスト