Fugu-MT 論文翻訳(概要): A Closed-Form Solution for Debiasing Vision-Language Models with Utility Guarantees Across Modalities and Tasks

論文の概要: A Closed-Form Solution for Debiasing Vision-Language Models with Utility Guarantees Across Modalities and Tasks

arxiv url: http://arxiv.org/abs/2603.12998v1
Date: Fri, 13 Mar 2026 13:55:34 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-16 17:38:12.105397
Title: A Closed-Form Solution for Debiasing Vision-Language Models with Utility Guarantees Across Modalities and Tasks
Title（参考訳）: モダリティとタスク間の実用性保証を伴う視覚・言語モデルのバイアス解消のための閉じた解法
Authors: Tangzheng Lian, Guanyu Hu, Yijing Ren, Dimitrios Kollias, Oya Celiktutan,
Abstract要約: VLM(Vision-Language Models)は、下流の様々なタスクで顕著なパフォーマンスを実現している。近年の研究では、トレーニングデータから社会的バイアスを継承し、下流のアプリケーションに広めることができることが示されている。クロスモーダル空間におけるtextbfclosed-form 解を生成するデバイアス法を提案する。
参考スコア（独自算出の注目度）: 17.71097531008228
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While Vision-Language Models (VLMs) have achieved remarkable performance across diverse downstream tasks, recent studies have shown that they can inherit social biases from the training data and further propagate them into downstream applications. To address this issue, various debiasing approaches have been proposed, yet most of them aim to improve fairness without having a theoretical guarantee that the utility of the model is preserved. In this paper, we introduce a debiasing method that yields a \textbf{closed-form} solution in the cross-modal space, achieving Pareto-optimal fairness with \textbf{bounded utility losses}. Our method is \textbf{training-free}, requires \textbf{no annotated data}, and can jointly debias both visual and textual modalities across downstream tasks. Extensive experiments show that our method outperforms existing methods in debiasing VLMs across diverse fairness metrics and datasets for both group and \textbf{intersectional} fairness in downstream tasks such as zero-shot image classification, text-to-image retrieval, and text-to-image generation while preserving task performance.
Abstract（参考訳）: VLM(Vision-Language Models)は、様々な下流タスクにおいて顕著なパフォーマンスを達成しているが、最近の研究では、トレーニングデータから社会的バイアスを継承し、下流アプリケーションに広めることができることが示されている。この問題に対処するために、様々なデバイアスングアプローチが提案されているが、そのほとんどは、モデルの有用性が保存されているという理論的保証を伴わずに、公平性を改善することを目的としている。本稿では, クロスモーダル空間における \textbf{closed-form} の解を導出するデバイアス法を導入し, 有用性損失を用いたPareto-Optimal Fairness を実現する。提案手法は,<textbf{training-free} であり,<textbf{no annotated data} を必要とする。画像のゼロショット分類, テキスト・ツー・イメージ検索, テキスト・ツー・イメージ生成などの下流タスクにおいて, 様々なフェアネスの指標とデータセットに対して, VLMの偏りを抑えながら, タスク性能を保ちながら, 既存の手法よりも優れていることを示す。

関連論文リスト

Modest-Align: Data-Efficient Alignment for Vision-Language Models [67.48633659305592]
クロスモーダルアライメントモデルは、リソース制約のある環境での運用において、過信と劣化したパフォーマンスに悩まされることが多い。我々は,ロバスト性と効率性に配慮した軽量アライメントフレームワークであるModest-Alignを提案する。本手法は,実世界の低リソースシナリオにおけるクロスモーダルアライメントを実現するための,実用的でスケーラブルなソリューションを提供する。
論文参考訳（メタデータ） (2025-10-24T16:11:10Z)
FairImagen: Post-Processing for Bias Mitigation in Text-to-Image Models [10.857020427374506]
FairImagenは、社会的偏見を緩和するための迅速な埋め込みで動作する、ポストホックな脱バイアスフレームワークである。我々のフレームワークは、既存のポストホック手法より優れており、公平なテキスト・画像生成のためのシンプルでスケーラブルでモデルに依存しないソリューションを提供する。
論文参考訳（メタデータ） (2025-10-24T11:47:15Z)
Fill the Gap: Quantifying and Reducing the Modality Gap in Image-Text Representation Learning [1.1087735229999818]
視覚言語モデル(VLM)は、テキストや画像を共有表現空間に埋め込むことができる。これらのモデルがモダリティギャップ現象(英語版)の対象であることは示されており、つまり、埋め込み空間において、あるモダリティと別のモダリティとの明確な分離が存在することを意味する。
論文参考訳（メタデータ） (2025-05-06T17:24:41Z)
Debiasing Vison-Language Models with Text-Only Training [15.069736314663352]
視覚バイアスを軽減するために,テキスト・アズ・イメージ・トレーニング・パラダイムを活用するTODというテキスト・オン・デバイアス・フレームワークを提案する。そこで本研究では,テキスト・アズ・イメージ・トレーニングのパラダイムを活用し,視覚バイアスを緩和するテキスト・オン・デバイアス化フレームワークTODを提案する。
論文参考訳（メタデータ） (2024-10-12T04:34:46Z)
A Unified Debiasing Approach for Vision-Language Models across Modalities and Tasks [12.313257689227013]
本稿では,機能プルーニングと低信頼プルーテーションを統合した新しい手法であるSelective Feature Imputation for Debiasing(SFID)を紹介する。 SFIDは多用途であり、出力のセマンティックな整合性を維持し、再訓練の必要性をなくすことで費用対効果を発揮できる。実験の結果,ゼロショット分類,テキスト・ツー・イメージ検索,画像キャプション,テキスト・ツー・イメージ生成など,様々なVLMタスクにおけるSFIDの有効性が示された。
論文参考訳（メタデータ） (2024-10-10T03:57:48Z)
Distilling Vision-Language Foundation Models: A Data-Free Approach via Prompt Diversification [49.41632476658246]
我々は、数十億レベルの画像テキストデータセットにアクセスすることなく、DFKDをVision-Language Foundation Modelsに拡張することについて議論する。目的は,配当に依存しないダウンストリームタスクに対して,与えられたカテゴリ概念を学生モデルにカスタマイズすることである。本稿では,多様なスタイルで画像合成を促進するために,3つの新しいプロンプト分岐法を提案する。
論文参考訳（メタデータ） (2024-07-21T13:26:30Z)
Debiasing Multimodal Large Language Models via Penalization of Language Priors [38.97645845493758]
MLLM(Multimodal Large Language Models)は、コンピュータビジョンや自然言語処理において欠かせないツールとなっている。生成されたコンテンツは、入力画像よりも、基礎となるLarge Language Models (LLMs) の本質的な先行性によって駆動されることが多い。本稿では、これらのバイアスを補正し、視覚情報に対するモデルの焦点をリダイレクトするための、単純でトレーニングのない2つの戦略を提案する。
論文参考訳（メタデータ） (2024-03-08T12:35:07Z)
Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
本稿では,テキスト埋め込みにおけるバイアスのある方向を投影することで,視覚言語基盤モデルを疎外する一般的な手法を提案する。偏平投影行列を組み込んだテキストのみをデバイアスすることで、ロバストな分類器と公正な生成モデルが得られることを示す。
論文参考訳（メタデータ） (2023-01-31T20:09:33Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。