Fugu-MT 論文翻訳(概要): Faceptor: A Generalist Model for Face Perception

論文の概要: Faceptor: A Generalist Model for Face Perception

arxiv url: http://arxiv.org/abs/2403.09500v1
Date: Thu, 14 Mar 2024 15:42:31 GMT
ステータス: 翻訳完了
システム内更新日: 2024-03-15 19:57:52.427202
Title: Faceptor: A Generalist Model for Face Perception
Title（参考訳）: Faceptor: 顔認識のためのジェネリストモデル
Authors: Lixiong Qin, Mei Wang, Xuannan Liu, Yuhang Zhang, Wei Deng, Xiaoshuai Song, Weiran Xu, Weihong Deng,
Abstract要約: Faceptorは、よく設計されたシングルエンコーダのデュアルデコーダアーキテクチャを採用するために提案されている。 Faceptorへのレイヤアテンションにより、モデルが最適なレイヤから機能を適応的に選択して、望ましいタスクを実行することができる。我々のトレーニングフレームワークは補助的な教師付き学習にも適用でき、年齢推定や表現認識といったデータスパースタスクの性能を大幅に向上させることができる。
参考スコア（独自算出の注目度）: 52.8066001012464
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the comprehensive research conducted on various face analysis tasks, there is a growing interest among researchers to develop a unified approach to face perception. Existing methods mainly discuss unified representation and training, which lack task extensibility and application efficiency. To tackle this issue, we focus on the unified model structure, exploring a face generalist model. As an intuitive design, Naive Faceptor enables tasks with the same output shape and granularity to share the structural design of the standardized output head, achieving improved task extensibility. Furthermore, Faceptor is proposed to adopt a well-designed single-encoder dual-decoder architecture, allowing task-specific queries to represent new-coming semantics. This design enhances the unification of model structure while improving application efficiency in terms of storage overhead. Additionally, we introduce Layer-Attention into Faceptor, enabling the model to adaptively select features from optimal layers to perform the desired tasks. Through joint training on 13 face perception datasets, Faceptor achieves exceptional performance in facial landmark localization, face parsing, age estimation, expression recognition, binary attribute classification, and face recognition, achieving or surpassing specialized methods in most tasks. Our training framework can also be applied to auxiliary supervised learning, significantly improving performance in data-sparse tasks such as age estimation and expression recognition. The code and models will be made publicly available at https://github.com/lxq1000/Faceptor.
Abstract（参考訳）: 様々な顔分析タスクに関する総合的な研究により、研究者の間では、顔の知覚に統一的なアプローチを開発することへの関心が高まっている。既存の手法では、タスクの拡張性やアプリケーションの効率性に欠ける統一表現とトレーニングを主に議論している。この問題に対処するために、我々は統合モデル構造に注目し、顔ジェネラリストモデルを探究する。直感的な設計として、Naive Faceptorは、同じ出力形状と粒度を持つタスクを標準化された出力ヘッドの構造設計を共有することを可能にし、タスク拡張性の向上を実現している。さらに、Fceptorはよく設計されたシングルエンコーダのデュアルデコーダアーキテクチャを採用し、タスク固有のクエリが新しいセマンティクスを表現できるようにする。この設計は、ストレージオーバーヘッドの観点からアプリケーションの効率を向上しつつ、モデル構造の統合を強化する。さらに、FceptorにLayer-Attentionを導入し、モデルが最適なレイヤから機能を適応的に選択して、望ましいタスクを実行できるようにします。 13の顔認識データセットのジョイントトレーニングを通じて、顔のランドマークのローカライゼーション、顔解析、年齢推定、表現認識、二項属性分類、顔認識において例外的なパフォーマンスを達成し、ほとんどのタスクにおいて特殊手法を達成または超越する。我々のトレーニングフレームワークは補助的な教師付き学習にも適用でき、年齢推定や表現認識といったデータスパースタスクの性能を大幅に向上させることができる。コードとモデルはhttps://github.com/lxq1000/Faceptor.comで公開される。

論文の概要: Faceptor: A Generalist Model for Face Perception

関連論文リスト