Fugu-MT 論文翻訳(概要): Rethinking Meeting Effectiveness: A Benchmark and Framework for Temporal Fine-grained Automatic Meeting Effectiveness Evaluation

論文の概要: Rethinking Meeting Effectiveness: A Benchmark and Framework for Temporal Fine-grained Automatic Meeting Effectiveness Evaluation

arxiv url: http://arxiv.org/abs/2604.17260v1
Date: Sun, 19 Apr 2026 04:59:53 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-21 21:52:52.42381
Title: Rethinking Meeting Effectiveness: A Benchmark and Framework for Temporal Fine-grained Automatic Meeting Effectiveness Evaluation
Title（参考訳）: ミーティング効果の再考: 時間的きめ細かいミーティング効果評価のためのベンチマークとフレームワーク
Authors: Yihang Li, Chenhui Chu,
Abstract要約: ミーティングの有効性を評価することは組織的生産性を向上させるのに不可欠です現在のアプローチでは、ミーティング全体の粗大なスコアを1つ獲得する、ポストホックな調査に頼っています。本稿では,新しい基準と時間的きめ細かなアプローチを中心としたミーティングの有効性を評価するための新しいパラダイムを提案する。
参考スコア（独自算出の注目度）: 23.813275217960093
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Evaluating meeting effectiveness is crucial for improving organizational productivity. Current approaches rely on post-hoc surveys that yield a single coarse-grained score for an entire meeting. The reliance on manual assessment is inherently limited in scalability, cost, and reproducibility. Moreover, a single score fails to capture the dynamic nature of collaborative discussions. We propose a new paradigm for evaluating meeting effectiveness centered on novel criteria and temporal fine-grained approach. We define effectiveness as the rate of objective achievement over time and assess it for individual topical segments within a meeting. To support this task, we introduce the AMI Meeting Effectiveness (AMI-ME) dataset, a new meta-evaluation dataset containing 2,459 human-annotated segments from 130 AMI Corpus meetings. We also develop an automatic effectiveness evaluation framework that uses a Large Language Model (LLM) as a judge to score each segment's effectiveness relative to the overall meeting objectives. Through substantial experiments, we establish a comprehensive benchmark for this new task and evaluate the framework's generalizability across distinct meeting types, ranging from business scenarios to unstructured discussions. Furthermore, we benchmark end-to-end performance starting from raw speech to measure the capabilities of a complete system. Our results validate the framework's effectiveness and provide strong baselines to facilitate future research in meeting analysis and multi-party dialogue. Our dataset and code will be publicly available. The AMI-ME dataset and the Automatic Evaluation Framework are available at: this URL.
Abstract（参考訳）: ミーティングの有効性を評価することは、組織の生産性を向上させるために重要です。現在のアプローチでは、ミーティング全体の粗大なスコアを1つ獲得する、ポストホックな調査に頼っています。手動による評価への依存は、本質的にスケーラビリティ、コスト、再現性に制限されている。さらに、ひとつのスコアが協調的な議論の動的な性質を捉えることに失敗する。本稿では,新しい基準と時間的きめ細かなアプローチを中心としたミーティングの有効性を評価するための新しいパラダイムを提案する。我々は、効果を時間とともに客観的な達成率として定義し、ミーティング内の個々のトピックセグメントに対して評価する。この課題を支援するために,130 AMI Corpus ミーティングから2,459 個の人称注釈セグメントを含む新たなメタ評価データセット AMI Meeting Effectiveness (AMI-ME) データセットを紹介した。また,Large Language Model (LLM) を判断に用いる自動評価フレームワークを開発し,ミーティング全体の目的に対して各セグメントの有効性を評価する。実際の実験を通じて、我々はこの新しいタスクの包括的なベンチマークを確立し、ビジネスシナリオから非構造的な議論まで、異なるミーティングタイプにわたるフレームワークの一般化可能性を評価する。さらに,生音声から始まるエンドツーエンドのパフォーマンスをベンチマークして,システム全体の性能を計測する。本研究は,フレームワークの有効性を検証し,会議分析や多人数対話における今後の研究を促進するための強力なベースラインを提供するものである。データセットとコードは公開されます。 AMI-MEデータセットとAutomatic Evaluation Frameworkは以下の通りである。

論文の概要: Rethinking Meeting Effectiveness: A Benchmark and Framework for Temporal Fine-grained Automatic Meeting Effectiveness Evaluation

関連論文リスト