Fugu-MT 論文翻訳(概要): Test-Time Adaptation via Cache Personalization for Facial Expression Recognition in Videos

論文の概要: Test-Time Adaptation via Cache Personalization for Facial Expression Recognition in Videos

arxiv url: http://arxiv.org/abs/2603.21309v1
Date: Sun, 22 Mar 2026 16:31:25 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-24 19:11:39.348391
Title: Test-Time Adaptation via Cache Personalization for Facial Expression Recognition in Videos
Title（参考訳）: ビデオにおける表情認識のためのキャッシュパーソナライズによるテスト時間適応
Authors: Masoumeh Sharafi, Muhammad Osama Zeeshan, Soufiane Belharbi, Alessandro Lameiras Koerich, Marco Pedersoli, Eric Granger,
Abstract要約: 本稿では,キャッシュベースのTTA方式であるキャッシュパーソナライゼーション(TTA-CaP)を提案する。実験により,TTA-CaPは,対象種別および環境変化下で,最先端のTTA法より優れることが示された。
参考スコア（独自算出の注目度）: 59.83490704563065
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Facial expression recognition (FER) in videos requires model personalization to capture the considerable variations across subjects. Vision-language models (VLMs) offer strong transfer to downstream tasks through image-text alignment, but their performance can still degrade under inter-subject distribution shifts. Personalizing models using test-time adaptation (TTA) methods can mitigate this challenge. However, most state-of-the-art TTA methods rely on unsupervised parameter optimization, introducing computational overhead that is impractical in many real-world applications. This paper introduces TTA through Cache Personalization (TTA-CaP), a cache-based TTA method that enables cost-effective (gradient-free) personalization of VLMs for video FER. Prior cache-based TTA methods rely solely on dynamic memories that store test samples, which can accumulate errors and drift due to noisy pseudo-labels. TTA-CaP leverages three coordinated caches: a personalized source cache that stores source-domain prototypes, a positive target cache that accumulates reliable subject-specific samples, and a negative target cache that stores low-confidence cases as negative samples to reduce the impact of noisy pseudo-labels. Cache updates and replacement are controlled by a tri-gate mechanism based on temporal stability, confidence, and consistency with the personalized cache. Finally, TTA-CaP refines predictions through fusion of embeddings, yielding refined representations that support temporally stable video-level predictions. Our experiments on three challenging video FER datasets, BioVid, StressID, and BAH, indicate that TTA-CaP can outperform state-of-the-art TTA methods under subject-specific and environmental shifts, while maintaining low computational and memory overhead for real-world deployment.
Abstract（参考訳）: ビデオにおける表情認識(FER)は、被験者間でかなりのバリエーションを捉えるためにモデルパーソナライズを必要とする。視覚言語モデル(VLM)は、画像テキストアライメントによる下流タスクへの強力な転送を提供するが、その性能はオブジェクト間の分散シフトで低下する可能性がある。テスト時間適応(TTA)法によるパーソナライズモデルは、この課題を軽減することができる。しかし、最先端のTTA手法の多くは教師なしパラメータ最適化に依存しており、現実の多くのアプリケーションでは実現不可能な計算オーバーヘッドを導入している。本稿では,ビデオFERのためのVLMの費用対効果(段階的な)パーソナライズを可能にするキャッシュベースのTTA手法であるキャッシュパーソナライズ(TTA-CaP)について紹介する。従来のキャッシュベースのTTAメソッドは、テストサンプルを格納する動的メモリのみに依存しており、ノイズの多い擬似ラベルのためにエラーやドリフトを蓄積することができる。 TTA-CaPはソースドメインのプロトタイプを格納するパーソナライズされたソースキャッシュ、信頼性の高い主題固有のサンプルを蓄積する正のターゲットキャッシュ、低信頼のケースを負のサンプルとして格納する負のターゲットキャッシュという3つの調整されたキャッシュを活用して、ノイズの多い擬似ラベルの影響を低減する。キャッシュの更新と置換は、パーソナライズされたキャッシュとの時間的安定性、信頼性、一貫性に基づいたトリゲートメカニズムによって制御される。最後に、TTA-CaPは埋め込みの融合によって予測を洗練し、時間的に安定したビデオレベルの予測をサポートする洗練された表現を生成する。ビデオFERデータセットであるBioVid、ScressID、BAHの3つの実験により、TTA-CaPは、対象と環境のシフト下での最先端のTTA手法より優れ、実際の展開において計算およびメモリオーバーヘッドが低いことを示唆した。

論文の概要: Test-Time Adaptation via Cache Personalization for Facial Expression Recognition in Videos

関連論文リスト