Fugu-MT 論文翻訳(概要): CAISE: Conversational Agent for Image Search and Editing

論文の概要: CAISE: Conversational Agent for Image Search and Editing

arxiv url: http://arxiv.org/abs/2202.11847v1
Date: Thu, 24 Feb 2022 00:55:52 GMT
ステータス: 翻訳完了
システム内更新日: 2022-02-25 16:28:58.159608
Title: CAISE: Conversational Agent for Image Search and Editing
Title（参考訳）: CAISE:画像検索と編集のための会話エージェント
Authors: Hyounghun Kim, Doo Soon Kim, Seunghyun Yoon, Franck Dernoncourt, Trung Bui, Mohit Bansal
Abstract要約: 画像検索・編集のための自動会話エージェント(CAISE)のデータセットを提案する。私たちの知る限り、これは対話型画像検索とアノテーションの編集を提供する最初のデータセットです。アシスタントアノテーションがツールで実行する機能は実行可能なコマンドとして記録される。
参考スコア（独自算出の注目度）: 109.57721903485663
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Demand for image editing has been increasing as users' desire for expression is also increasing. However, for most users, image editing tools are not easy to use since the tools require certain expertise in photo effects and have complex interfaces. Hence, users might need someone to help edit their images, but having a personal dedicated human assistant for every user is impossible to scale. For that reason, an automated assistant system for image editing is desirable. Additionally, users want more image sources for diverse image editing works, and integrating an image search functionality into the editing tool is a potential remedy for this demand. Thus, we propose a dataset of an automated Conversational Agent for Image Search and Editing (CAISE). To our knowledge, this is the first dataset that provides conversational image search and editing annotations, where the agent holds a grounded conversation with users and helps them to search and edit images according to their requests. To build such a system, we first collect image search and editing conversations between pairs of annotators. The assistant-annotators are equipped with a customized image search and editing tool to address the requests from the user-annotators. The functions that the assistant-annotators conduct with the tool are recorded as executable commands, allowing the trained system to be useful for real-world application execution. We also introduce a generator-extractor baseline model for this task, which can adaptively select the source of the next token (i.e., from the vocabulary or from textual/visual contexts) for the executable command. This serves as a strong starting point while still leaving a large human-machine performance gap for useful future work. Our code and dataset are publicly available at: https://github.com/hyounghk/CAISE
Abstract（参考訳）: ユーザの表現欲求が高まるにつれて、画像編集の需要も増大している。しかし、ほとんどのユーザーにとって画像編集ツールは、写真効果の専門知識と複雑なインターフェースを必要とするため、使いやすくない。そのため、ユーザーは画像の編集を手伝う人が必要だが、すべてのユーザーのために個人専用のヒューマンアシスタントを持つことは不可能だ。そのため、画像編集のための自動アシスタントシステムが望ましい。さらに、ユーザーは多様な画像編集作業のためにより多くの画像ソースを欲しがっており、画像検索機能を編集ツールに統合することは、この要求に対する潜在的な修正である。そこで我々は,画像検索と編集のための自動会話エージェント(CAISE)のデータセットを提案する。私たちの知る限り、これは会話的な画像検索と編集アノテーションを提供する最初のデータセットであり、エージェントはユーザーと接地した会話を保持し、リクエストに応じて画像の検索と編集を支援する。このようなシステムを構築するために,まず画像検索とアノテータ間の会話の編集を行う。アシスタントアノテータは、ユーザーアノテータからの要求に対応するカスタマイズされた画像検索および編集ツールを備えている。アシスタントアノテータがツールで実行する機能は実行可能なコマンドとして記録され、トレーニングされたシステムは実世界のアプリケーション実行に役立ちます。また,実行コマンドに対して,次のトークン(語彙から,あるいはテキスト/視覚コンテキストから)のソースを適応的に選択できる,ジェネレータ・エキストラクタベースラインモデルも導入する。これは、将来の有用な作業のために、大きな人間と機械のパフォーマンスギャップを残しながら、強力な出発点となる。私たちのコードとデータセットは、https://github.com/hyounghk/CAISEで公開されています。

関連論文リスト

Enhancing Intent Understanding for Ambiguous prompt: A Human-Machine Co-Adaption Strategy [28.647935556492957]
ユーザのプロンプトと修正中の画像の相互情報を用いた人間機械協調型適応戦略を提案する。改良されたモデルにより、複数ラウンドの調整の必要性が軽減されることが判明した。
論文参考訳（メタデータ） (2025-01-25T10:32:00Z)
A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models [117.77807994397784]
画像編集は、ユーザーが特定の要求を満たすために、与えられた合成画像または実際の画像を編集することを目的としている。この分野での最近の顕著な進歩は、テキスト・ツー・イメージ(T2I)拡散モデルの開発に基づいている。 T2Iベースの画像編集手法は、編集性能を大幅に向上させ、マルチモーダル入力でガイドされたコンテンツを修正するためのユーザフレンドリーなインタフェースを提供する。
論文参考訳（メタデータ） (2024-06-20T17:58:52Z)
Empowering Visual Creativity: A Vision-Language Assistant to Image Editing Recommendations [109.65267337037842]
IER(Image Editing Recommendation)の課題を紹介する。 IERは、入力画像から多様なクリエイティブな編集命令を自動生成し、ユーザの未指定の編集目的を表すシンプルなプロンプトを作成することを目的としている。本稿では,Creative-Vision Language Assistant(Creativity-VLA)を紹介する。
論文参考訳（メタデータ） (2024-05-31T18:22:29Z)
Divide and Conquer: Language Models can Plan and Self-Correct for Compositional Text-to-Image Generation [72.6168579583414]
CompAgentは、大規模な言語モデル(LLM)エージェントをコアとして、コンポジションテキスト・画像生成のためのトレーニング不要のアプローチである。提案手法は,オープンワールド合成T2I生成のための総合的なベンチマークであるT2I-CompBenchに対して10%以上の改善を達成している。
論文参考訳（メタデータ） (2024-01-28T16:18:39Z)
The Contemporary Art of Image Search: Iterative User Intent Expansion via Vision-Language Model [4.531548217880843]
画像検索のための革新的なユーザ意図拡張フレームワークを提案する。本フレームワークは,視覚モデルを用いてマルチモーダルなユーザ入力を解析・構成する。提案フレームワークは,ユーザの画像検索体験を大幅に改善する。
論文参考訳（メタデータ） (2023-12-04T06:14:25Z)
Edit As You Wish: Video Caption Editing with Multi-grained User Control [61.76233268900959]
マルチグラデーションなユーザリクエストでガイドされた既存のビデオ記述を自動的に修正する新しい textbfVideo textbfCaption textbfEditing textbf(VCE) タスクを提案する。人間の書き直し習慣にインスパイアされたユーザコマンドは、粗い粒度からきめ細かな粒度まで多様なユーザニーズをカバーするために、重要な3重テキスト操作、位置、属性として設計される。
論文参考訳（メタデータ） (2023-05-15T07:12:19Z)
ImageEye: Batch Image Processing Using Program Synthesis [7.111443975103331]
本稿では,バッチ画像処理のための新しい合成手法を提案する。本手法は画像内の個々のオブジェクトに微細な編集を施すことができる。提案手法をImageEyeと呼ばれるツールに実装し,50個の画像編集タスクで評価した。
論文参考訳（メタデータ） (2023-04-06T17:38:34Z)
CHATEDIT: Towards Multi-turn Interactive Facial Image Editing via Dialogue [17.503012018823902]
本稿では、画像編集と会話能力を評価するためのChatEditベンチマークデータセットを提案する。 ChatEditはCelebA-HQデータセットから構築され、画像上のユーザの編集要求に対応する注釈付きマルチターンダイアログが組み込まれている。本稿では,ユーザ要求のトラッキングと応答生成のための対話モジュールを統合した新しいベースラインフレームワークを提案する。
論文参考訳（メタデータ） (2023-03-20T13:45:58Z)
NICER: Aesthetic Image Enhancement with Humans in the Loop [0.7756211500979312]
本研究は,完全,半自動,完全手動のプロセスにおいて,非参照画像強調に対するニューラルネットワークに基づくアプローチを提案する。 NICERは,ユーザインタラクションを伴わずに画像美学を向上し,ユーザインタラクションを可能とすることで,多様な改善結果が得られることを示す。
論文参考訳（メタデータ） (2020-12-03T09:14:10Z)
Text as Neural Operator: Image Manipulation by Text Instruction [68.53181621741632]
本稿では、複雑なテキスト命令を用いて複数のオブジェクトで画像を編集し、オブジェクトの追加、削除、変更を可能にする設定について検討する。タスクの入力は、(1)参照画像を含むマルチモーダルであり、(2)所望の修正を記述した自然言語の命令である。提案モデルは,最近の3つの公開データセットの強いベースラインに対して良好に動作することを示す。
論文参考訳（メタデータ） (2020-08-11T07:07:10Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。