Fugu-MT 論文翻訳(概要): Iterative Definition Refinement for Zero-Shot Classification via LLM-Based Semantic Prototype Optimization

論文の概要: Iterative Definition Refinement for Zero-Shot Classification via LLM-Based Semantic Prototype Optimization

arxiv url: http://arxiv.org/abs/2604.27335v1
Date: Thu, 30 Apr 2026 02:25:33 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-01 16:31:53.883575
Title: Iterative Definition Refinement for Zero-Shot Classification via LLM-Based Semantic Prototype Optimization
Title（参考訳）: LLMに基づくセマンティックプロトタイプ最適化によるゼロショット分類の反復的定義修正
Authors: Naeem Rehmat, Muhammad Saad Saeed, Ijaz Ul Haq, Khalid Malik,
Abstract要約: Webフィルタリングシステムは、サイバー脅威をブロックし、データの流出を防ぎ、コンプライアンスを確保するために、正確なWebコンテンツ分類に依存している。埋め込みベースのゼロショットアプローチは、コンテンツとカテゴリ記述を共有意味空間にマッピングする。不明確な定義は埋め込み空間において意味的な重複を生じさせ、体系的な誤分類をもたらす。ゼロショットウェブコンテンツ分類を改善する訓練不要で適応的な反復的定義改善フレームワークを提案する。
参考スコア（独自算出の注目度）: 1.7288526441135115
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Web filtering systems rely on accurate web content classification to block cyber threats, prevent data exfiltration, and ensure compliance. However, classification is increasingly difficult due to the dynamic and rapidly evolving nature of the modern web. Embedding-based zero-shot approaches map content and category descriptions into a shared semantic space, enabling label assignment without labeled training data, but remain highly sensitive to definition quality. Poorly specified or ambiguous definitions create semantic overlap in the embedding space, leading to systematic misclassification. In this paper, we propose a training-free, adaptive iterative definition refinement framework that improves zero-shot web content classification by progressively optimizing category definitions rather than updating model parameters. Using LLMs as feedback-driven definition optimizers, we investigate three refinement strategies namely example-guided, confusion-aware, and history-aware, each refining class descriptions using structured signals from misclassified instances. Furthermore, we introduce a human-labeled benchmark of 10 URL categories with 1,000 samples per class and evaluate across 13 state-of-the-art embedding foundation models. Results demonstrate that iterative definition refinement consistently improves classification performance across diverse architectures, establishing definition quality as a critical and underexplored factor in embedding-based systems. The dataset is available at https://github.com/naeemrehmat/B2MWT-10C.
Abstract（参考訳）: Webフィルタリングシステムは、サイバー脅威をブロックし、データの流出を防ぎ、コンプライアンスを確保するために、正確なWebコンテンツ分類に依存している。しかし、現代ウェブの動的で急速に進化する性質のため、分類はますます困難になっている。埋め込みベースのゼロショットは、コンテンツとカテゴリ記述を共有セマンティック空間にマッピングし、ラベル付きトレーニングデータなしでラベルを割り当てるが、定義品質に非常に敏感である。不明確な定義は埋め込み空間において意味的な重複を生じさせ、体系的な誤分類をもたらす。本稿では,モデルパラメータを更新するのではなく,カテゴリ定義を段階的に最適化することで,ゼロショットWebコンテンツ分類を改善する,学習自由で適応的な反復的定義改善フレームワークを提案する。 LLMをフィードバック駆動型定義オプティマイザとして使用し、サンプル誘導、混乱認識、履歴認識という3つの改善戦略を検証し、それぞれが誤分類されたインスタンスからの構造化信号を用いてクラス記述を精製する。さらに,10のURLカテゴリとクラス毎1,000のサンプルのラベル付きベンチマークを導入し,13の最先端組込み基盤モデルに対して評価を行った。その結果、反復的定義の洗練は様々なアーキテクチャの分類性能を一貫して改善し、組み込みシステムにおいて、定義品質を重要かつ過小評価された要素として確立することを示した。データセットはhttps://github.com/naeemrehmat/B2MWT-10Cで公開されている。

論文の概要: Iterative Definition Refinement for Zero-Shot Classification via LLM-Based Semantic Prototype Optimization

関連論文リスト