Fugu-MT 論文翻訳(概要): BAGEL: Benchmarking Animal Knowledge Expertise in Language Models

論文の概要: BAGEL: Benchmarking Animal Knowledge Expertise in Language Models

arxiv url: http://arxiv.org/abs/2604.16241v1
Date: Fri, 17 Apr 2026 17:00:37 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-20 22:00:20.017084
Title: BAGEL: Benchmarking Animal Knowledge Expertise in Language Models
Title（参考訳）: BAGEL: 言語モデルにおける動物知識のベンチマーク
Authors: Jiacheng Shen, Masato Hagiwara, Milad Alizadeh, Ellen Gilsenan-McMahon, Marius Miron, David Robinson, Emmanuel Chemla, Sara Keen, Gagan Narula, Mathieu Laurière, Matthieu Geist, Olivier Pietquin,
Abstract要約: BAGELは、言語モデルにおける動物知識の専門知識を評価するためのベンチマークである。 BAGELは、クローズドブックの評価に焦点をあてて、推論時に外部検索を行わないモデルに関する動物関連の知識を測定する。
参考スコア（独自算出の注目度）: 35.401726501331275
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Large language models have shown strong performance on broad-domain knowledge and reasoning benchmarks, but it remains unclear how well language models handle specialized animal-related knowledge under a unified closed-book evaluation protocol. We introduce BAGEL, a benchmark for evaluating animal knowledge expertise in language models. BAGEL is constructed from diverse scientific and reference sources, including bioRxiv, Global Biotic Interactions, Xeno-canto, and Wikipedia, using a combination of curated examples and automatically generated closed-book question-answer pairs. The benchmark covers multiple aspects of animal knowledge, including taxonomy, morphology, habitat, behavior, vocalization, geographic distribution, and species interactions. By focusing on closed-book evaluation, BAGEL measures animal-related knowledge of models without external retrieval at inference time. BAGEL further supports fine-grained analysis across source domains, taxonomic groups, and knowledge categories, enabling a more precise characterization of model strengths and systematic failure modes. Our benchmark provides a new testbed for studying domain-specific knowledge generalization in language models and for improving their reliability in biodiversity-related applications.
Abstract（参考訳）: 大規模言語モデルは広範ドメインの知識と推論のベンチマークにおいて高い性能を示してきたが、統一されたクローズドブック評価プロトコルの下で、言語モデルが特定の動物関連の知識をどのように扱うかは定かではない。言語モデルにおける動物知識の専門知識を評価するためのベンチマークであるBAGELを紹介する。 BAGELは、bioRxiv、Global Biotic Interactions、Xeno-canto、Wikipediaなど、さまざまな科学的および参照ソースから構築されている。このベンチマークは、分類学、形態学、生息地、行動、発声、地理的分布、種間相互作用など、動物の知識の様々な側面をカバーしている。 BAGELは、クローズドブックの評価に焦点をあてて、推論時に外部検索を行わないモデルに関する動物関連の知識を測定する。 BAGELはさらに、ソースドメイン、分類群、知識カテゴリの詳細な分析をサポートしており、モデルの強度と体系的な失敗モードをより正確に評価することができる。本ベンチマークは,言語モデルにおけるドメイン固有知識一般化の研究と,生物多様性関連アプリケーションにおける信頼性向上のための新しいテストベッドを提供する。

関連論文リスト

RoBiologyDataChoiceQA: A Romanian Dataset for improving Biology understanding of Large Language Models [0.15293427903448023]
大規模言語モデル(LLM)は、様々な自然言語処理(NLP)タスクにおいて大きな可能性を証明している。本研究は,複数選択生物学の疑問に対するルーマニア語の新たなデータセットについて紹介する。
論文参考訳（メタデータ） (2025-09-30T05:41:50Z)
Animer une base de connaissance: des ontologies aux mod{è}les d'I.A. g{é}n{é}rative [0.0]
本稿では、応用分野に基づく象徴型AIとニューラル(または準記号型)AIのハイブリッド化の読解を提案する。言語学・文化学におけるLaCASエコシステム-オープンアーカイブについて述べる。知識領域「世界の言語」(540言語)と知識対象「クィンチュア(言語)」を用いて,本手法を解説する。
論文参考訳（メタデータ） (2025-09-01T09:40:55Z)
Taxonomic Reasoning for Rare Arthropods: Combining Dense Image Captioning and RAG for Interpretable Classification [12.923336716880506]
画像キャプションと検索拡張生成(RAG)を大規模言語モデル(LLM)と統合し,生物多様性モニタリングを強化する。我々の発見は、生物多様性保護イニシアチブをサポートする現代のビジョン言語AIパイプラインの可能性を強調した。
論文参考訳（メタデータ） (2025-03-13T21:18:10Z)
Advancing bioinformatics with large language models: components, applications and perspectives [12.728981464533918]
LLM(Large Language Model)は、ディープラーニングに基づく人工知能モデルのクラスである。バイオインフォマティクスにおける大規模言語モデル(LLM)の本質的構成要素について概観する。主な側面としては、さまざまなデータ型に対するトークン化メソッド、トランスフォーマーモデルのアーキテクチャ、コアアテンションメカニズムなどがある。
論文参考訳（メタデータ） (2024-01-08T17:26:59Z)
Diversifying Knowledge Enhancement of Biomedical Language Models using Adapter Modules and Knowledge Graphs [54.223394825528665]
我々は、軽量なアダプターモジュールを用いて、構造化された生体医学的知識を事前訓練された言語モデルに注入するアプローチを開発した。バイオメディカル知識システムUMLSと新しいバイオケミカルOntoChemの2つの大きなKGと、PubMedBERTとBioLinkBERTの2つの著名なバイオメディカルPLMを使用している。計算能力の要件を低く保ちながら,本手法がいくつかの事例において性能改善につながることを示す。
論文参考訳（メタデータ） (2023-12-21T14:26:57Z)
The KITMUS Test: Evaluating Knowledge Integration from Multiple Sources in Natural Language Understanding Systems [87.3207729953778]
我々は、データセット上で最先端のコア参照解決モデルを評価する。いくつかのモデルは、事前訓練時間と推論時間の両方で観察された知識について、オンザフライで推論するのに苦労している。それでも、最高のパフォーマンスモデルでさえ、推論時にのみ提示される知識を確実に統合するのは難しいようです。
論文参考訳（メタデータ） (2022-12-15T23:26:54Z)
Cetacean Translation Initiative: a roadmap to deciphering the communication of sperm whales [97.41394631426678]
最近の研究では、非ヒト種における音響コミュニケーションを分析するための機械学習ツールの約束を示した。マッコウクジラの大量生物音響データの収集と処理に必要な重要な要素について概説する。開発された技術能力は、非人間コミュニケーションと動物行動研究を研究する幅広いコミュニティにおいて、クロス応用と進歩をもたらす可能性が高い。
論文参考訳（メタデータ） (2021-04-17T18:39:22Z)
Linguistic Typology Features from Text: Inferring the Sparse Features of World Atlas of Language Structures [73.06435180872293]
我々は、バイト埋め込みと畳み込み層に基づく繰り返しニューラルネットワーク予測器を構築する。様々な言語型の特徴を確実に予測できることを示す。
論文参考訳（メタデータ） (2020-04-30T21:00:53Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。