Fugu-MT 論文翻訳(概要): MedCL-Bench: Benchmarking stability-efficiency trade-offs and scaling in biomedical continual learning

論文の概要: MedCL-Bench: Benchmarking stability-efficiency trade-offs and scaling in biomedical continual learning

arxiv url: http://arxiv.org/abs/2603.16738v1
Date: Tue, 17 Mar 2026 16:18:41 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-18 17:42:07.408752
Title: MedCL-Bench: Benchmarking stability-efficiency trade-offs and scaling in biomedical continual learning
Title（参考訳）: MedCL-Bench: バイオメディカル連続学習における安定性と効率のトレードオフとスケーリングのベンチマーク
Authors: Min Zeng, Shuang Zhou, Zaifu Zhan, Rui Zhang,
Abstract要約: MedCL-Benchは5つのタスクファミリーにまたがる10のバイオメディカルNLPデータセットをストリームする。我々は8つのタスクオーダに対して連続的な学習戦略を11つ評価し、保持率、移動率、GPU時間コストを報告した。 forttingはタスクに依存しており、マルチラベルのトピック分類が最も脆弱で制約のある出力タスクがより堅牢である。
参考スコア（独自算出の注目度）: 12.134701089850282
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Medical language models must be updated as evidence and terminology evolve, yet sequential updating can trigger catastrophic forgetting. Although biomedical NLP has many static benchmarks, no unified, task-diverse benchmark exists for evaluating continual learning under standardized protocols, robustness to task order and compute-aware reporting. We introduce MedCL-Bench, which streams ten biomedical NLP datasets spanning five task families and evaluates eleven continual learning strategies across eight task orders, reporting retention, transfer, and GPU-hour cost. Across backbones and task orders, direct sequential fine-tuning on incoming tasks induces catastrophic forgetting, causing update-induced performance regressions on prior tasks. Continual learning methods occupy distinct retention-compute frontiers: parameter-isolation provides the best retention per GPU-hour, replay offers strong protection at higher cost, and regularization yields limited benefit. Forgetting is task-dependent, with multi-label topic classification most vulnerable and constrained-output tasks more robust. MedCL-Bench provides a reproducible framework for auditing model updates before deployment.
Abstract（参考訳）: 医療言語モデルは、証拠と用語が進化するにつれて更新されなければならないが、逐次更新は破滅的な忘れを招きかねない。バイオメディカルNLPには多くの静的ベンチマークがあるが、標準化されたプロトコル下での継続学習の評価、タスク順序に対する堅牢性、および計算対応レポートのための統合されたタスク多様性ベンチマークは存在しない。 MedCL-Benchは、5つのタスクファミリーにまたがる10のバイオメディカルNLPデータセットをストリームし、8つのタスクオーダにわたる11の継続的学習戦略を評価し、保持、転送、GPU時間コストを報告します。バックボーンとタスクオーダをまたいで、入ってくるタスクを直接逐次微調整することで、破滅的な忘れを招き、前回のタスクに更新によるパフォーマンス低下を引き起こす。パラメータアイソレーションはGPU時間当たりの最高の保持を提供し、リプレイはより高いコストで強力な保護を提供し、正規化は限られた利益をもたらす。 forttingはタスクに依存しており、マルチラベルのトピック分類が最も脆弱で制約のある出力タスクがより堅牢である。 MedCL-Benchは、デプロイ前にモデルの更新を監査するための再現可能なフレームワークを提供する。

論文の概要: MedCL-Bench: Benchmarking stability-efficiency trade-offs and scaling in biomedical continual learning

関連論文リスト