Fugu-MT 論文翻訳(概要): LitBench: A Graph-Centric Large Language Model Benchmarking Tool For Literature Tasks

論文の概要: LitBench: A Graph-Centric Large Language Model Benchmarking Tool For Literature Tasks

arxiv url: http://arxiv.org/abs/2603.00051v1
Date: Tue, 10 Feb 2026 04:12:29 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-09 01:20:08.001156
Title: LitBench: A Graph-Centric Large Language Model Benchmarking Tool For Literature Tasks
Title（参考訳）: LitBench: グラフ中心の大規模言語モデルベンチマークツール
Authors: Andreas Varvarigos, Ali Maatouk, Jiasheng Zhang, Ngoc Bui, Jialin Chen, Leandros Tassiulas, Rex Ying,
Abstract要約: 本稿では,ドメイン固有言語モデルの開発と評価を可能にするベンチマークツールLitBenchを紹介する。 LitBenchの中核となるのは、ドメイン固有の文学のサブグラフを生成するデータキュレーションプロセスである。データセットのキュレーションに加えて、LitBenchは、ノードやエッジレベルの分析から高度なアプリケーションまで、包括的な文学タスクスイートを定義している。
参考スコア（独自算出の注目度）: 31.14225125626119
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While large language models (LLMs) have become the de facto framework for literature-related tasks, they still struggle to function as domain-specific literature agents due to their inability to connect pieces of knowledge and reason across domain-specific contexts, terminologies, and nomenclatures. This challenge underscores the need for a tool that facilitates such domain-specific adaptation and enables rigorous benchmarking across literature tasks. To that end, we introduce LitBench, a benchmarking tool designed to enable the development and evaluation of domain-specific LLMs tailored to literature-related tasks. At its core, LitBench uses a data curation process that generates domain-specific literature sub-graphs and constructs training and evaluation datasets based on the textual attributes of the resulting nodes and edges. The tool is designed for flexibility, supporting the curation of literature graphs across any domain chosen by the user, whether high-level fields or specialized interdisciplinary areas. In addition to dataset curation, LitBench defines a comprehensive suite of literature tasks, ranging from node and edge level analyses to advanced applications such as related work generation. These tasks enable LLMs to internalize domain-specific knowledge and relationships embedded in the curated graph during training, while also supporting rigorous evaluation of model performance. Our results show that small domain-specific LLMs trained and evaluated on LitBench datasets achieve competitive performance compared to state-of-the-art models like GPT-4o and DeepSeek-R1. To enhance accessibility and ease of use, we open-source the tool along with an AI agent tool that streamlines data curation, model training, and evaluation.
Abstract（参考訳）: 大規模言語モデル(LLM)は、文学関連のタスクのデファクトフレームワークとなっているが、ドメイン固有のコンテキスト、用語、命名法をまたいで知識と推論を接続できないため、ドメイン固有の文献エージェントとして機能することは依然として困難である。この課題は、そのようなドメイン固有の適応を促進し、文学タスク間の厳密なベンチマークを可能にするツールの必要性を強調している。そこで本研究では,文献関連タスクに適したドメイン固有LLMの開発と評価を可能にするベンチマークツールLitBenchを紹介する。中心となるLitBenchは、ドメイン固有の文学サブグラフを生成するデータキュレーションプロセスを使用して、結果のノードとエッジのテキスト属性に基づいて、トレーニングと評価データセットを構築する。このツールは柔軟性のために設計されており、ハイレベルなフィールドや専門分野の分野を問わず、ユーザが選択したドメインをまたいだ文学グラフのキュレーションをサポートする。データセットのキュレーションに加えて、LitBenchは、ノードとエッジレベルの分析から、関連するワーク生成のような高度なアプリケーションまで、包括的な文学タスクスイートを定義している。これらのタスクにより、LLMはトレーニング中にキュレートされたグラフに埋め込まれたドメイン固有の知識や関係を内部化し、モデル性能の厳密な評価をサポートすることができる。 GPT-4o や DeepSeek-R1 のような最先端モデルと比較して,LitBench データセットをトレーニングし,評価した小さなドメイン固有 LLM が競争力を発揮することを示す。アクセシビリティと使いやすさを高めるため、私たちは、データキュレーション、モデルトレーニング、評価を効率化するAIエージェントツールとともに、ツールをオープンソース化しました。

論文の概要: LitBench: A Graph-Centric Large Language Model Benchmarking Tool For Literature Tasks

関連論文リスト