Fugu-MT 論文翻訳(概要): BatteryPass-12K: The First Dataset for the Novel Digital Battery Passport Conformance Task

論文の概要: BatteryPass-12K: The First Dataset for the Novel Digital Battery Passport Conformance Task

arxiv url: http://arxiv.org/abs/2604.26986v1
Date: Tue, 28 Apr 2026 20:51:08 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-01 16:31:53.698899
Title: BatteryPass-12K: The First Dataset for the Novel Digital Battery Passport Conformance Task
Title（参考訳）: BatteryPass-12K:新しいデジタルバッテリー・パスポート・コンフォーマンス・タスクのための最初のデータセット
Authors: Tosin Adewumi, Martin Karlsson, Lama Alkhaled, Marcus Liwicki,
Abstract要約: 本稿では,デジタルバッテリパスポート(DBP)分類タスクであるBatteryPass-12Kに関する最初の公開ベンチマークを紹介する。これは、EUのDBPに関するバッテリ規制がまもなく発効し、パブリックデータセットが存在しないためである。ゼロショット推論では22の言語モデル (LM) が評価され, 小さいLM (SLM) , 専門家 (MoEs) と高密度LSM が混在している。
参考スコア（独自算出の注目度）: 5.4156846785975725
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce a novel task of digital battery passport (DBP) conformance classification and introduce the first public benchmark for the task: BatteryPass-12K, created synthetically from real pilot samples. This is as the EU's battery regulation on DBPs comes into effect soon and there exists no public dataset. We evaluated 22 language models (LMs) in zero-shot inference, spanning small LMs (SLMs), mixture of experts (MoEs), and dense LLMs. We also conducted analysis, additional evaluations of few-shot inference and prompt-injection attacks to find that (1) Thinking models have the best performance (with GPT-5.4 scoring 0.98 (0.03) and 0.71 (0.22) on average as F1 (and confidence interval at 95%) on the validation and test sets, respectively), (2) few-shot examples improve performance significantly, (3) generally capable frontier models find the task challenging, (4) merely scaling model parameters does not necessarily lead to improved performance, as SLMs outperformed some LLMs, and (5) prompt-injection attacks degrade performance. We note that BatteryPass-12K, though limited to real pilot samples, may be useful for other known or emerging tasks in the battery domain, e.g. lifecycle reasoning. We publicly release the dataset under a permissive licence (CC-BY-4.0).
Abstract（参考訳）: 本稿では,デジタルバッテリパスポート(DBP)適合度分類の新たなタスクを導入し,実際のパイロットサンプルから合成して作成したBatteryPass-12Kというタスクのための最初の公開ベンチマークを紹介する。これは、EUのDBPに関するバッテリ規制がまもなく発効し、パブリックデータセットが存在しないためである。ゼロショット推論では22の言語モデル (LM) が評価され, 小さいLM (SLM) , 専門家 (MoE) と高密度LSM が混在していた。また, 分析, 数発推論, 即発インジェクション攻撃のさらなる評価を行い, 1) 思考モデルがF1(および検証およびテストセットの95%の信頼区間)で平均0.98(0.03)と0.71(0.22)の最高性能を有すること, (2) 少数ショットモデルではタスクが困難であること, (4) モデルパラメータを単にスケーリングすることは必ずしも性能が向上するとは限らないこと, (5) SLMが一部のLLMより優れていること, (5) 即発インジェクション攻撃は性能が低下すること, などを検討した。 BatteryPass-12Kは実際のパイロットサンプルに限られていますが、バッテリードメイン内の既知のタスクや新しいタスク、例えばライフサイクルの推論に役立ちます。我々は、そのデータセットをパーミッシブライセンス(CC-BY-4.0)で公開する。

論文の概要: BatteryPass-12K: The First Dataset for the Novel Digital Battery Passport Conformance Task

関連論文リスト