Fugu-MT 論文翻訳(概要): An Early Evaluation of GPT-4V(ision)

論文の概要: An Early Evaluation of GPT-4V(ision)

arxiv url: http://arxiv.org/abs/2310.16534v1
Date: Wed, 25 Oct 2023 10:33:17 GMT
ステータス: 翻訳完了
システム内更新日: 2023-10-26 15:24:35.300519
Title: An Early Evaluation of GPT-4V(ision)
Title（参考訳）: GPT-4V(ision)の早期評価
Authors: Yang Wu, Shilong Wang, Hao Yang, Tian Zheng, Hongbo Zhang, Yanyan Zhao, Bing Qin
Abstract要約: 我々は,GPT-4Vの視覚的理解,言語理解,視覚パズルの解法,深度,熱,映像,音声などの他のモダリティの理解など,様々な能力を評価する。 GPT-4Vの性能を評価するため、656の試験インスタンスを手動で構築し、GPT-4Vの結果を慎重に評価する。
参考スコア（独自算出の注目度）: 40.866323649060696
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we evaluate different abilities of GPT-4V including visual understanding, language understanding, visual puzzle solving, and understanding of other modalities such as depth, thermal, video, and audio. To estimate GPT-4V's performance, we manually construct 656 test instances and carefully evaluate the results of GPT-4V. The highlights of our findings are as follows: (1) GPT-4V exhibits impressive performance on English visual-centric benchmarks but fails to recognize simple Chinese texts in the images; (2) GPT-4V shows inconsistent refusal behavior when answering questions related to sensitive traits such as gender, race, and age; (3) GPT-4V obtains worse results than GPT-4 (API) on language understanding tasks including general language understanding benchmarks and visual commonsense knowledge evaluation benchmarks; (4) Few-shot prompting can improve GPT-4V's performance on both visual understanding and language understanding; (5) GPT-4V struggles to find the nuances between two similar images and solve the easy math picture puzzles; (6) GPT-4V shows non-trivial performance on the tasks of similar modalities to image, such as video and thermal. Our experimental results reveal the ability and limitations of GPT-4V and we hope our paper can provide some insights into the application and research of GPT-4V.
Abstract（参考訳）: 本稿では,gpt-4vの視覚理解,言語理解,視覚パズル解法,奥行き,熱,映像,音声といった他のモダリティの理解など,様々な能力を評価する。 GPT-4Vの性能を評価するため、656の試験インスタンスを手動で構築し、GPT-4Vの結果を慎重に評価する。 The highlights of our findings are as follows: (1) GPT-4V exhibits impressive performance on English visual-centric benchmarks but fails to recognize simple Chinese texts in the images; (2) GPT-4V shows inconsistent refusal behavior when answering questions related to sensitive traits such as gender, race, and age; (3) GPT-4V obtains worse results than GPT-4 (API) on language understanding tasks including general language understanding benchmarks and visual commonsense knowledge evaluation benchmarks; (4) Few-shot prompting can improve GPT-4V's performance on both visual understanding and language understanding; (5) GPT-4V struggles to find the nuances between two similar images and solve the easy math picture puzzles; (6) GPT-4V shows non-trivial performance on the tasks of similar modalities to image, such as video and thermal. 実験結果から, GPT-4Vの能力と限界が明らかとなり, GPT-4Vの応用と研究についていくつかの知見が得られることを期待する。

関連論文リスト

GPT-4o System Card [211.87336862081963]
GPT-4oは自動回帰オムニモデルであり、テキスト、オーディオ、画像、ビデオの組み合わせを入力として受け入れる。テキスト、ビジョン、オーディオでエンドツーエンドにトレーニングされており、すべての入力と出力は同じニューラルネットワークで処理される。 GPT-4は、英語とコードのテキスト上でのTurboのパフォーマンスと一致し、非英語のテキストでは大幅に改善された。
論文参考訳（メタデータ） (2024-10-25T17:43:01Z)
Notes on Applicability of GPT-4 to Document Understanding [0.0]
文書理解分野に関するすべての公開GPT-4ファミリーモデルを評価する。 GPT-4 Vision Turboは,外部のOCRエンジンで認識されたテキストと入力上の文書イメージの両方を提供する場合,テキストのみのモデルでは良好な結果が得られない。
論文参考訳（メタデータ） (2024-05-28T17:59:53Z)
Gemini Pro Defeated by GPT-4V: Evidence from Education [1.0226894006814744]
GPT-4Vは、スコアリング精度と四重み付きカッパの点でゲミニプロを著しく上回っている。 GPT-4Vは複雑な教育課題に対処する能力に優れていた。
論文参考訳（メタデータ） (2023-12-27T02:56:41Z)
GPT-4V with Emotion: A Zero-shot Benchmark for Generalized Emotion Recognition [38.2581985358104]
GPT-4 with Vision (GPT-4V) は、様々なタスクにおいて顕著な視覚能力を示すが、その感情認識性能は十分に評価されていない。 6つのタスクをカバーする21のベンチマークデータセットに対して,GPT-4Vの定量的評価結果を示す。
論文参考訳（メタデータ） (2023-12-07T13:27:37Z)
GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition? [82.40761196684524]
本稿では,ゼロショット視覚認識タスクにおけるGPT-4の言語的・視覚的能力の評価に焦点を当てる。我々は、画像、ビデオ、点群にわたるGPT-4の性能を評価するための広範な実験を行った。言語記述が充実したGPT-4はゼロショット認識を著しく改善した。
論文参考訳（メタデータ） (2023-11-27T11:29:10Z)
Comparing Humans, GPT-4, and GPT-4V On Abstraction and Reasoning Tasks [53.936643052339]
GPT-4のテキストのみおよびマルチモーダル版による推論能力の評価を行った。実験結果から,GPT-4のどちらのバージョンも人間に近いレベルで頑健な抽象化能力を開発していないという結論が得られた。
論文参考訳（メタデータ） (2023-11-14T04:33:49Z)
Holistic Analysis of Hallucination in GPT-4V(ision): Bias and Interference Challenges [54.42256219010956]
このベンチマークは、視覚言語モデルにおける2つの一般的な幻覚、すなわちバイアスと干渉を評価するために設計されている。偏見はモデルがある種の反応を幻覚させる傾向を示すもので、おそらくはトレーニングデータの不均衡によるものである。干渉とは、テキストプロンプトのフレーズ化や入力画像の表示方法によって、GPT-4V(ision)の判定が破壊されるシナリオである。
論文参考訳（メタデータ） (2023-11-06T17:26:59Z)
GPT-4V(ision) as a Generalist Evaluator for Vision-Language Tasks [70.98062518872999]
我々は,GPT-4Vの性能評価,基本画像からテキストへの合成,高レベル画像から画像への変換,複数画像からテキストへのアライメントといったタスクに対処する能力を検証する。特に、GPT-4Vは、様々なタスクや評価方法にまたがって人間と有望な合意を示し、マルチモーダルLCMを評価対象として持つ可能性を示している。
論文参考訳（メタデータ） (2023-11-02T16:11:09Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。