Fugu-MT 論文翻訳(概要): COINBench: Moving Beyond Individual Perspectives to Collective Intent Understanding

論文の概要: COINBench: Moving Beyond Individual Perspectives to Collective Intent Understanding

arxiv url: http://arxiv.org/abs/2603.21329v1
Date: Sun, 22 Mar 2026 17:12:14 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-24 19:11:39.3592
Title: COINBench: Moving Beyond Individual Perspectives to Collective Intent Understanding
Title（参考訳）: COINBench: 個々の視点を超えて、集合的意図を理解する
Authors: Xiaozhe Li, Tianyi Lyu, Siyi Yang, Yizhao Yang, Yuxi Gong, Jinxuan Huang, Ligao Zhang, Zhuoyi Huang, Qingwen Liu,
Abstract要約: COIN-BENCHは、大言語モデル(LLM)を集合的意図的理解に基づいて評価する、ライブ更新ベンチマークである。トランザクション結果にフォーカスする従来のベンチマークとは異なり、COIN-BENCHは階層的な認知構造として意図を運用している。このフレームワークは、階層的な認知的構造化と検索強化検証(COIN-RAG)のためのCOIN-TREEを組み込んで、生の人的議論を分析するための専門家レベルの精度を確保する。
参考スコア（独自算出の注目度）: 4.5799194788369455
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Understanding human intent is a high-level cognitive challenge for Large Language Models (LLMs), requiring sophisticated reasoning over noisy, conflicting, and non-linear discourse. While LLMs excel at following individual instructions, their ability to distill Collective Intent - the process of extracting consensus, resolving contradictions, and inferring latent trends from multi-source public discussions - remains largely unexplored. To bridge this gap, we introduce COIN-BENCH, a dynamic, real-world, live-updating benchmark specifically designed to evaluate LLMs on collective intent understanding within the consumer domain. Unlike traditional benchmarks that focus on transactional outcomes, COIN-BENCH operationalizes intent as a hierarchical cognitive structure, ranging from explicit scenarios to deep causal reasoning. We implement a robust evaluation pipeline that combines a rule-based method with an LLM-as-the-Judge approach. This framework incorporates COIN-TREE for hierarchical cognitive structuring and retrieval-augmented verification (COIN-RAG) to ensure expert-level precision in analyzing raw, collective human discussions. An extensive evaluation of 20 state-of-the-art LLMs across four dimensions - depth, breadth, informativeness, and correctness - reveals that while current models can handle surface-level aggregation, they still struggle with the analytical depth required for complex intent synthesis. COIN-BENCH establishes a new standard for advancing LLMs from passive instruction followers to expert-level analytical agents capable of deciphering the collective voice of the real world. See our project page on COIN-BENCH.
Abstract（参考訳）: 人間の意図を理解することは、Large Language Models(LLM)の高レベルな認知的課題であり、ノイズ、矛盾、非線形の言論に対する洗練された推論を必要とする。 LLMは個々の指示に従うのに優れていますが、コンセンサスを抽出し、矛盾を解消し、複数ソースの公開議論から潜伏傾向を推測する、集合インテントを蒸留する能力はほとんど解明されていません。このギャップを埋めるために、消費者ドメイン内の集合的意図理解に基づいてLLMを評価するために特別に設計された動的で実世界のライブ更新ベンチマークであるCOIN-BENCHを導入する。トランザクション結果にフォーカスする従来のベンチマークとは異なり、COIN-BENCHは明示的なシナリオから深い因果推論まで、階層的な認知構造として意図を運用している。ルールベースの手法とLCM-as-the-Judgeアプローチを組み合わせたロバストな評価パイプラインを実装した。このフレームワークは、階層的認知的構造化と検索強化検証(COIN-RAG)のためのCOIN-TREEを組み込んで、生の人的議論を分析するための専門家レベルの精度を確保する。深度, 広さ, 情報性, 正しさの4次元にまたがる20種類の最先端LCMを広範囲に評価した結果, 現在のモデルでは表面レベルの集約を処理できるが, 複雑な意図合成に必要な分析深度に苦慮していることが明らかとなった。 COIN-BENCHは、受動的指導者から現実世界の集団の声を解読できる専門家レベルの分析エージェントまで、LSMを前進させる新しい標準を確立している。 COIN-BENCHのプロジェクトページを参照。

論文の概要: COINBench: Moving Beyond Individual Perspectives to Collective Intent Understanding

関連論文リスト