Fugu-MT 論文翻訳(概要): SciZoom: A Large-scale Benchmark for Hierarchical Scientific Summarization across the LLM Era

論文の概要: SciZoom: A Large-scale Benchmark for Hierarchical Scientific Summarization across the LLM Era

arxiv url: http://arxiv.org/abs/2603.16131v1
Date: Tue, 17 Mar 2026 05:34:46 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-18 17:42:07.110332
Title: SciZoom: A Large-scale Benchmark for Hierarchical Scientific Summarization across the LLM Era
Title（参考訳）: SciZoom: LLM時代の階層的な科学的要約のための大規模ベンチマーク
Authors: Han Jang, Junhyeok Lee, Kyu Sung Choi,
Abstract要約: SciZoomは、2020年から2025年までの4つのトップレベルのML会場から44,946の論文からなるベンチマークである。我々の言語学的分析は、句パターンの顕著な変化(式表現の最大10倍)と修辞スタイル(23%のヘッジ減少)を明らかにしている。 SciZoomは、生成AI時代の科学的談話の進化をマイニングするための、挑戦的なベンチマークとユニークなリソースとして機能する。
参考スコア（独自算出の注目度）: 2.2090506971647144
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The explosive growth of AI research has created unprecedented information overload, increasing the demand for scientific summarization at multiple levels of granularity beyond traditional abstracts. While LLMs are increasingly adopted for summarization, existing benchmarks remain limited in scale, target only a single granularity, and predate the LLM era. Moreover, since the release of ChatGPT in November 2022, researchers have rapidly adopted LLMs for drafting manuscripts themselves, fundamentally transforming scientific writing, yet no resource exists to analyze how this writing has evolved. To bridge these gaps, we introduce SciZoom, a benchmark comprising 44,946 papers from four top-tier ML venues (NeurIPS, ICLR, ICML, EMNLP) spanning 2020 to 2025, explicitly stratified into Pre-LLM and Post-LLM eras. SciZoom provides three hierarchical summarization targets (Abstract, Contributions, and TL;DR) achieving compression ratios up to 600:1, enabling both multi-granularity summarization research and temporal mining of scientific writing patterns. Our linguistic analysis reveals striking shifts in phrase patterns (up to 10x for formulaic expressions) and rhetorical style (23% decline in hedging), suggesting that LLM-assisted writing produces more confident yet homogenized prose. SciZoom serves as both a challenging benchmark and a unique resource for mining the evolution of scientific discourse in the generative AI era. Our code and dataset are publicly available on GitHub (https://github.com/janghana/SciZoom) and Hugging Face (https://huggingface.co/datasets/hanjang/SciZoom), respectively.
Abstract（参考訳）: AI研究の爆発的な成長は、前例のない情報過剰を引き起こし、科学的な要約の要求を、従来の抽象概念を超えた様々なレベルの粒度で増大させた。 LLMは徐々に要約に採用されているが、既存のベンチマークは規模が限られており、単一の粒度のみをターゲットにしており、LLM時代より前のものである。さらに、2022年11月にChatGPTがリリースされて以来、研究者は原稿自体の草稿作成にLSMを急速に採用し、科学的な執筆を根本的に変えてきたが、この書体がどのように進化したかを分析するためのリソースは存在しない。これらのギャップを埋めるために、2020年から2025年までの4つの上位ML会場(NeurIPS, ICLR, ICML, EMNLP)から44,946の論文からなるベンチマークであるSciZoomを紹介します。 SciZoomは3つの階層的な要約目標(抽象的、貢献的、TL;DR)を提供し、圧縮比を600:1まで達成し、多粒度要約研究と科学書記パターンの時間的マイニングを可能にする。言語学的解析により,句パターン(式式では最大10倍)と修辞スタイル(23%の減少)の顕著な変化が明らかとなり,LCMによる文章作成がより確実で同質な散文を生み出すことが示唆された。 SciZoomは、生成AI時代の科学的談話の進化をマイニングするための、挑戦的なベンチマークとユニークなリソースとして機能する。私たちのコードとデータセットはGitHub(https://github.com/janghana/SciZoom)とHugging Face(https://huggingface.co/datasets/hanjang/SciZoom)で公開されています。

論文の概要: SciZoom: A Large-scale Benchmark for Hierarchical Scientific Summarization across the LLM Era

関連論文リスト