Fugu-MT 論文翻訳(概要): BTZSC: A Benchmark for Zero-Shot Text Classification Across Cross-Encoders, Embedding Models, Rerankers and LLMs

論文の概要: BTZSC: A Benchmark for Zero-Shot Text Classification Across Cross-Encoders, Embedding Models, Rerankers and LLMs

arxiv url: http://arxiv.org/abs/2603.11991v1
Date: Thu, 12 Mar 2026 14:43:20 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-13 14:46:26.147606
Title: BTZSC: A Benchmark for Zero-Shot Text Classification Across Cross-Encoders, Embedding Models, Rerankers and LLMs
Title（参考訳）: BTZSC: クロスエンコーダ、埋め込みモデル、リランカ、LLM間のゼロショットテキスト分類のためのベンチマーク
Authors: Ilias Aarab,
Abstract要約: ゼロショットテキスト分類(ZSC)は、コストのかかるタスク固有のアノテーションを排除することを約束する。テキスト埋め込みモデル、リランカ、命令調整型大規模言語モデル(LLM)の最近の進歩は、NLIベースのアーキテクチャの優位性に挑戦している。我々は、感情、トピック、意図、感情の分類にまたがる22の公開データセットの総合ベンチマークであるBTZSCを紹介する。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Zero-shot text classification (ZSC) offers the promise of eliminating costly task-specific annotation by matching texts directly to human-readable label descriptions. While early approaches have predominantly relied on cross-encoder models fine-tuned for natural language inference (NLI), recent advances in text-embedding models, rerankers, and instruction-tuned large language models (LLMs) have challenged the dominance of NLI-based architectures. Yet, systematically comparing these diverse approaches remains difficult. Existing evaluations, such as MTEB, often incorporate labeled examples through supervised probes or fine-tuning, leaving genuine zero-shot capabilities underexplored. To address this, we introduce BTZSC, a comprehensive benchmark of 22 public datasets spanning sentiment, topic, intent, and emotion classification, capturing diverse domains, class cardinalities, and document lengths. Leveraging BTZSC, we conduct a systematic comparison across four major model families, NLI cross-encoders, embedding models, rerankers and instruction-tuned LLMs, encompassing 38 public and custom checkpoints. Our results show that: (i) modern rerankers, exemplified by Qwen3-Reranker-8B, set a new state-of-the-art with macro F1 = 0.72; (ii) strong embedding models such as GTE-large-en-v1.5 substantially close the accuracy gap while offering the best trade-off between accuracy and latency; (iii) instruction-tuned LLMs at 4--12B parameters achieve competitive performance (macro F1 up to 0.67), excelling particularly on topic classification but trailing specialized rerankers; (iv) NLI cross-encoders plateau even as backbone size increases; and (v) scaling primarily benefits rerankers and LLMs over embedding models. BTZSC and accompanying evaluation code are publicly released to support fair and reproducible progress in zero-shot text understanding.
Abstract（参考訳）: ゼロショットテキスト分類(ZSC)は、テキストを人間が読めるラベル記述に直接マッチングすることで、コストのかかるタスク固有のアノテーションをなくすという約束を提供する。初期のアプローチは、自然言語推論(NLI)のために微調整されたクロスエンコーダモデルに大きく依存しているが、最近のテキスト埋め込みモデル、リランカ、命令チューニングされた大規模言語モデル(LLM)の進歩は、NLIベースのアーキテクチャの優位性に挑戦している。しかし、これらの多様なアプローチを体系的に比較することは依然として困難である。 MTEBのような既存の評価は、しばしば監督されたプローブや微調整を通してラベル付き例を取り入れ、真のゼロショット能力は未探索のままである。 BTZSCは、感情、話題、意図、感情の分類にまたがる22の公開データセットの総合的なベンチマークであり、多様なドメイン、クラス基準、文書の長さをキャプチャする。 BTZSCを利用すると、NLIクロスエンコーダ、埋め込みモデル、リランカー、命令調整LDMの4つの主要なモデルファミリを体系的に比較し、38の公開チェックポイントとカスタムチェックポイントを含む。私たちの結果はこう示しています。 (i) Qwen3-Reranker-8Bによって例示された現代のリランカーは、マクロF1 = 0.72で新しい最先端を設定します。 (II)GTE-large-en-v1.5のような強力な埋め込みモデルは、精度とレイテンシの最良のトレードオフを提供しながら、精度のギャップを著しく埋める。 3) 4--12Bパラメータの命令調整LDMは、特にトピック分類において優れたが、特別なリランカーに追従する競争性能を達成する(F1から0.67まで)。四背骨の大きさが増大しても、NLIクロスエンコーダ台地及び (v)スケーリングは主に、埋め込みモデルよりもリランカーやLLMの恩恵を受けます。 BTZSCと付随する評価コードは、ゼロショットテキスト理解における公平で再現可能な進歩をサポートするために公開されている。

論文の概要: BTZSC: A Benchmark for Zero-Shot Text Classification Across Cross-Encoders, Embedding Models, Rerankers and LLMs

関連論文リスト