Fugu-MT 論文翻訳(概要): Learning-to-Rank Meets Language: Boosting Language-Driven Ordering Alignment for Ordinal Classification

論文の概要: Learning-to-Rank Meets Language: Boosting Language-Driven Ordering Alignment for Ordinal Classification

arxiv url: http://arxiv.org/abs/2306.13856v3
Date: Mon, 23 Oct 2023 12:20:56 GMT
ステータス: 翻訳完了
システム内更新日: 2023-10-24 11:43:34.322371
Title: Learning-to-Rank Meets Language: Boosting Language-Driven Ordering Alignment for Ordinal Classification
Title（参考訳）: Learning-to-Rank Meets Language: 正規分類のための言語駆動順序付けの強化
Authors: Rui Wang, Peipei Li, Huaibo Huang, Chunshui Cao, Ran He, Zhaofeng He
Abstract要約: 順序分類のための新しい言語駆動順序付け手法を提案する。事前学習された視覚言語モデルの最近の発展は、人間の言語におけるリッチな順序性を活用するきっかけとなった。顔の年齢推定,ヒストリカルカラーイメージ(HCI)分類,美的評価を含む3つの日常的分類課題の実験は,その有望な性能を示す。
参考スコア（独自算出の注目度）: 60.28913031192201
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a novel language-driven ordering alignment method for ordinal classification. The labels in ordinal classification contain additional ordering relations, making them prone to overfitting when relying solely on training data. Recent developments in pre-trained vision-language models inspire us to leverage the rich ordinal priors in human language by converting the original task into a visionlanguage alignment task. Consequently, we propose L2RCLIP, which fully utilizes the language priors from two perspectives. First, we introduce a complementary prompt tuning technique called RankFormer, designed to enhance the ordering relation of original rank prompts. It employs token-level attention with residual-style prompt blending in the word embedding space. Second, to further incorporate language priors, we revisit the approximate bound optimization of vanilla cross-entropy loss and restructure it within the cross-modal embedding space. Consequently, we propose a cross-modal ordinal pairwise loss to refine the CLIP feature space, where texts and images maintain both semantic alignment and ordering alignment. Extensive experiments on three ordinal classification tasks, including facial age estimation, historical color image (HCI) classification, and aesthetic assessment demonstrate its promising performance. The code is available at https://github.com/raywang335/L2RCLIP.
Abstract（参考訳）: 順序分類のための新しい言語駆動順序付け手法を提案する。順序分類のラベルには追加の順序関係が含まれており、トレーニングデータのみに依存する場合、オーバーフィットしやすい。最近の事前訓練された視覚言語モデルの発展は、人間の言語における豊かな序列を、元のタスクを視覚言語アライメントタスクに変換することによって活用することを促す。そこで本稿では,L2RCLIPを提案する。まず,従来のランクプロンプトの順序付け関係を強化するために,RandFormerという補完的なプロンプトチューニング手法を導入する。トークンレベルの注意と、単語埋め込み空間における残差スタイルのプロンプトブレンドを用いる。第二に,言語事前化をさらに取り入れるために,バニラクロスエントロピー損失の近似境界最適化を再検討し,それをクロスモーダル埋め込み空間に再構成する。そこで本研究では,テキストと画像がセマンティックアライメントと秩序アライメントの両方を維持できるCLIP特徴空間を洗練するための,クロスモーダルな順序対ロスを提案する。顔の年齢推定,ヒストリカルカラーイメージ(HCI)分類,審美評価など,3つの日常的分類課題に対する広範囲な実験は,その有望な性能を示す。コードはhttps://github.com/raywang335/L2RCLIPで入手できる。

論文の概要: Learning-to-Rank Meets Language: Boosting Language-Driven Ordering Alignment for Ordinal Classification

関連論文リスト