Fugu-MT 論文翻訳(概要): Assessing Image Quality Issues for Real-World Problems

論文の概要: Assessing Image Quality Issues for Real-World Problems

arxiv url: http://arxiv.org/abs/2003.12511v2
Date: Mon, 30 Mar 2020 16:47:09 GMT
ステータス: 翻訳完了
システム内更新日: 2022-12-19 05:22:06.007639
Title: Assessing Image Quality Issues for Real-World Problems
Title（参考訳）: 実世界問題に対する画質問題の評価
Authors: Tai-Yin Chiu, Yinan Zhao, Danna Gurari
Abstract要約: 視覚障害者が撮影した39,181枚の画像は、それぞれがコンテンツを認識するのに十分な品質かどうかを判定する。これらのラベルは、以下のコントリビューションを行う上で、私たちにとって重要な基盤となります。
参考スコア（独自算出の注目度）: 27.63363921838157
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce a new large-scale dataset that links the assessment of image quality issues to two practical vision tasks: image captioning and visual question answering. First, we identify for 39,181 images taken by people who are blind whether each is sufficient quality to recognize the content as well as what quality flaws are observed from six options. These labels serve as a critical foundation for us to make the following contributions: (1) a new problem and algorithms for deciding whether an image is insufficient quality to recognize the content and so not captionable, (2) a new problem and algorithms for deciding which of six quality flaws an image contains, (3) a new problem and algorithms for deciding whether a visual question is unanswerable due to unrecognizable content versus the content of interest being missing from the field of view, and (4) a novel application of more efficiently creating a large-scale image captioning dataset by automatically deciding whether an image is insufficient quality and so should not be captioned. We publicly-share our datasets and code to facilitate future extensions of this work: https://vizwiz.org.
Abstract（参考訳）: 画像品質問題の評価を,画像キャプションと視覚的質問応答という,2つの実用的な視覚課題に結びつける,新たな大規模データセットを提案する。まず、視覚障害者が撮影した39,181枚の画像から、コンテンツを認識するのに十分な品質か、6つの選択肢からどのような品質欠陥が観察されるかを確認する。 These labels serve as a critical foundation for us to make the following contributions: (1) a new problem and algorithms for deciding whether an image is insufficient quality to recognize the content and so not captionable, (2) a new problem and algorithms for deciding which of six quality flaws an image contains, (3) a new problem and algorithms for deciding whether a visual question is unanswerable due to unrecognizable content versus the content of interest being missing from the field of view, and (4) a novel application of more efficiently creating a large-scale image captioning dataset by automatically deciding whether an image is insufficient quality and so should not be captioned. この作業の今後の拡張を促進するために、データセットとコードを公開しています。

関連論文リスト

Picking the Cream of the Crop: Visual-Centric Data Selection with Collaborative Agents [62.616106562146776]
textbfVisual-Centric textbfSelection approach by textbfAgents Collaboration (ViSA)を提案する。提案手法は,(1)視覚エージェントの協調による画像情報定量化手法により,リッチな視覚情報を持つ画像を選択する方法,(2)高品質な画像に関連する高品質な画像を選択する視覚中心の指示品質評価手法からなる。
論文参考訳（メタデータ） (2025-02-27T09:37:30Z)
VQA$^2$: Visual Question Answering for Video Quality Assessment [76.81110038738699]
ビデオ品質アセスメント(VQA)は、低レベルの視覚知覚において古典的な分野である。画像領域における最近の研究は、視覚質問応答(VQA)が視覚的品質を著しく低レベルに評価できることを示した。 VQA2インストラクションデータセットは,ビデオ品質評価に焦点をあてた最初の視覚的質問応答インストラクションデータセットである。 VQA2シリーズは、ビデオにおける空間的時間的品質の詳細の知覚を高めるために、視覚的および運動的トークンをインターリーブする。
論文参考訳（メタデータ） (2024-11-06T09:39:52Z)
Q-Ground: Image Quality Grounding with Large Multi-modality Models [61.72022069880346]
Q-Groundは、大規模な視覚的品質グラウンドに取り組むための最初のフレームワークである。 Q-Groundは、大規模なマルチモダリティモデルと詳細な視覚的品質分析を組み合わせる。コントリビューションの中心は、QGround-100Kデータセットの導入です。
論文参考訳（メタデータ） (2024-07-24T06:42:46Z)
Negative Results of Image Processing for Identifying Duplicate Questions on Stack Overflow [2.2667044928324747]
Stack Overflow上の重複した質問を識別するための画像ベース手法について検討した。まず,画像からのテキストを疑問テキストに統合し,第2に画像キャプションを用いて視覚的内容に基づいて画像を評価する。私たちの研究は、簡単な複製と仮説検証の基盤を築き、将来の研究を私たちのアプローチの上に構築します。
論文参考訳（メタデータ） (2024-07-08T00:14:21Z)
Helping Visually Impaired People Take Better Quality Pictures [52.03016269364854]
我々は、視覚障害者が共通の技術的歪みの発生を最小限に抑えるためのツールを開発する。また、ユーザによる品質問題の緩和を支援する、プロトタイプのフィードバックシステムも作成しています。
論文参考訳（メタデータ） (2023-05-14T04:37:53Z)
Deep Image Matting: A Comprehensive Survey [85.77905619102802]
本稿では,ディープラーニング時代における画像マッチングの最近の進歩を概観する。本稿では,補助的な入力ベースの画像マッチングと,自動的な画像マッチングという,2つの基本的なサブタスクに焦点を当てる。画像マッチングの関連応用について論じ,今後の研究への課題と可能性を明らかにする。
論文参考訳（メタデータ） (2023-04-10T15:48:55Z)
Feedback is Needed for Retakes: An Explainable Poor Image Notification Framework for the Visually Impaired [6.0158981171030685]
筆者らのフレームワークはまず画像の品質を判定し,高品質であると判定された画像のみを用いてキャプションを生成する。ユーザは、画像品質が低ければ再取り込みする欠陥機能により通知され、このサイクルは、入力画像が高品質であると判断されるまで繰り返される。
論文参考訳（メタデータ） (2022-11-17T09:22:28Z)
Word-Level Fine-Grained Story Visualization [58.16484259508973]
ストーリービジュアライゼーションは、動的シーンやキャラクターをまたいだグローバルな一貫性を備えた多文ストーリーで各文をナレーションする一連の画像を生成することを目的としている。現在の作業は画像の品質と一貫性に苦慮しており、追加のセマンティック情報や補助的なキャプションネットワークに依存している。まず,全ての物語文からの単語情報を取り入れた新しい文表現を導入し,不整合問題を緩和する。そこで本稿では,画像の質とストーリーの整合性を改善するために,融合機能を備えた新たな識別器を提案する。
論文参考訳（メタデータ） (2022-08-03T21:01:47Z)
MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media Knowledge Extraction and Grounding [131.8797942031366]
我々は、画像中のオブジェクトをテキストにクロスメディアグラウンドする必要があるニュース記事について、1,384の質問を含む新しいQA評価ベンチマークを示す。具体的には、画像キャプチャーペアの推論を必要とするマルチホップ質問によって、参照されている接地された視覚オブジェクトを特定し、その質問に答えるためにニュースボディテキストからスパンを予測する。本稿では, マルチメディアデータ拡張フレームワークを提案する。これは, クロスメディア知識抽出と合成質問応答生成に基づいて, このタスクの弱い監視を提供するデータを自動的に強化するものである。
論文参考訳（メタデータ） (2021-12-20T18:23:30Z)
Captioning Images Taken by People Who Are Blind [25.263950448575923]
VizWiz-Captionsは盲人から生まれた39,000以上の画像で構成されており、それぞれに5つのキャプションがある。このデータセットを解析して,(1)典型的なキャプションを特徴付ける,(2)画像中のコンテンツの多様性を特徴付ける,(3)一般的な8つのビジョンデータセットと比較する。
論文参考訳（メタデータ） (2020-02-20T04:36:39Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。