Fugu-MT 論文翻訳(概要): Pushing the Limits of ChatGPT on NLP Tasks

論文の概要: Pushing the Limits of ChatGPT on NLP Tasks

arxiv url: http://arxiv.org/abs/2306.09719v2
Date: Mon, 9 Oct 2023 15:48:23 GMT
ステータス: 翻訳完了
システム内更新日: 2023-10-13 08:50:55.150594
Title: Pushing the Limits of ChatGPT on NLP Tasks
Title（参考訳）: NLPタスクにおけるChatGPTの限界を押す
Authors: Xiaofei Sun, Linfeng Dong, Xiaoya Li, Zhen Wan, Shuhe Wang, Tianwei Zhang, Jiwei Li, Fei Cheng, Lingjuan Lyu, Fei Wu, Guoyin Wang
Abstract要約: ChatGPTの成功にもかかわらず、ほとんどのNLPタスクのパフォーマンスは教師付きベースラインよりかなり低い。そこで本研究では,原因を調べた結果,以下の要因が原因であることが判明した。 NLPタスクにおけるChatGPTの限界を押し上げるために,これらの問題に対処する汎用モジュールの集合を提案する。
参考スコア（独自算出の注目度）: 79.17291002710517
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Despite the success of ChatGPT, its performances on most NLP tasks are still well below the supervised baselines. In this work, we looked into the causes, and discovered that its subpar performance was caused by the following factors: (1) token limit in the prompt does not allow for the full utilization of the supervised datasets; (2) mismatch between the generation nature of ChatGPT and NLP tasks; (3) intrinsic pitfalls of LLMs models, e.g., hallucination, overly focus on certain keywords, etc. In this work, we propose a collection of general modules to address these issues, in an attempt to push the limits of ChatGPT on NLP tasks. Our proposed modules include (1) a one-input-multiple-prompts strategy that employs multiple prompts for one input to accommodate more demonstrations; (2) using fine-tuned models for better demonstration retrieval; (3) transforming tasks to formats that are more tailored to the generation nature; (4) employing reasoning strategies that are tailored to addressing the task-specific complexity; (5) the self-verification strategy to address the hallucination issue of LLMs; (6) the paraphrase strategy to improve the robustness of model predictions. We conduct experiments on 21 datasets of 10 representative NLP tasks, including question answering, commonsense reasoning, natural language inference, sentiment analysis, named entity recognition, entity-relation extraction, event extraction, dependency parsing, semantic role labeling, and part-of-speech tagging. Using the proposed assemble of techniques, we are able to significantly boost the performance of ChatGPT on the selected NLP tasks, achieving performances comparable to or better than supervised baselines, or even existing SOTA performances.
Abstract（参考訳）: ChatGPTの成功にもかかわらず、ほとんどのNLPタスクのパフォーマンスは教師付きベースラインよりかなり低い。本研究では,(1)プロンプトのトークン制限が教師付きデータセットの完全利用を許さないこと,(2)chatgptの生成特性とnlpタスクのミスマッチ,(3)幻覚などのllmsモデルの本質的落とし穴,など,いくつかのキーワードに重きを置くこと,などが原因であることを明らかにした。本研究では,これらの問題に対処する汎用モジュールの集合を提案し,NLPタスクにおけるChatGPTの限界を推し進める。 Our proposed modules include (1) a one-input-multiple-prompts strategy that employs multiple prompts for one input to accommodate more demonstrations; (2) using fine-tuned models for better demonstration retrieval; (3) transforming tasks to formats that are more tailored to the generation nature; (4) employing reasoning strategies that are tailored to addressing the task-specific complexity; (5) the self-verification strategy to address the hallucination issue of LLMs; (6) the paraphrase strategy to improve the robustness of model predictions. 質問応答,コモンセンス推論,自然言語推論,感情分析,名前付きエンティティ認識,エンティティ関係抽出,イベント抽出,依存関係解析,セマンティクスロールラベリング,part-of-speech tagingなど,代表的な10のnlpタスクの21のデータセットについて実験を行った。提案手法を用いて、選択したNLPタスクにおけるChatGPTの性能を大幅に向上させ、教師付きベースラインや既存のSOTAパフォーマンスに匹敵する性能を達成することができる。

関連論文リスト

Zero-Shot Keyphrase Generation: Investigating Specialized Instructions and Multi-Sample Aggregation on Large Language Models [52.829293635314194]
キーフレーズ生成(英: Keyphrase generation)とは、ある文書のキーフレーズを自動的に生成する、長期にわたるNLPタスクである。本稿では,オープンソースの命令調整型LDM (Phi-3, Llama-3) のゼロショット機能と,このタスクのためのクローズドソース GPT-4o に着目した。
論文参考訳（メタデータ） (2025-03-01T19:38:57Z)
Prompting Strategies for Enabling Large Language Models to Infer Causation from Correlation [68.58373854950294]
我々は因果推論に焦点をあて,相関情報に基づく因果関係の確立という課題に対処する。この問題に対して,元のタスクを固定的なサブクエストに分割するプロンプト戦略を導入する。既存の因果ベンチマークであるCorr2Causeに対するアプローチを評価した。
論文参考訳（メタデータ） (2024-12-18T15:32:27Z)
RankPrompt: Step-by-Step Comparisons Make Language Models Better Reasoners [38.30539869264287]
大きな言語モデル(LLM)は、様々な推論タスクで素晴らしいパフォーマンスを実現しています。しかし、ChatGPTのような最先端のLCMでさえ、推論プロセス中に論理的な誤りを犯しやすい。新たなプロンプト手法である RankPrompt を導入し,LLM が追加リソースを必要とせずに応答を自己ランクできる手法を提案する。
論文参考訳（メタデータ） (2024-03-19T02:34:18Z)
The Shifted and The Overlooked: A Task-oriented Investigation of User-GPT Interactions [114.67699010359637]
実際のユーザクエリの大規模なコレクションをGPTに解析する。ユーザインタラクションでは'設計'や'計画'といったタスクが一般的だが,従来のNLPベンチマークとは大きく異なる。
論文参考訳（メタデータ） (2023-10-19T02:12:17Z)
A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets [19.521390684403293]
本稿では,ChatGPTの性能を多種多様な学術データセット上で徹底的に評価する。具体的には、140タスクにわたるChatGPTを評価し、これらのデータセットで生成される255Kの応答を分析する。
論文参考訳（メタデータ） (2023-05-29T12:37:21Z)
ChatGraph: Interpretable Text Classification by Converting ChatGPT Knowledge to Graphs [54.48467003509595]
ChatGPTは、様々な自然言語処理(NLP)タスクにおいて優れたパフォーマンスを示している。テキスト分類などの特定のタスクにChatGPTのパワーを利用する新しいフレームワークを提案する。本手法は,従来のテキスト分類法と比較して,より透過的な意思決定プロセスを提供する。
論文参考訳（メタデータ） (2023-05-03T19:57:43Z)
Exploring the Feasibility of ChatGPT for Event Extraction [31.175880361951172]
イベント抽出は、自然言語処理における基本的なタスクであり、テキストで言及されたイベントに関する情報を特定し、抽出する。 ChatGPTは、タスク固有のデータセットや微調整を必要とせずに、単純なプロンプトで言語タスクを解決する機会を提供する。また,ChatGPTは,脳波や複雑なシナリオにおけるタスク固有モデルの性能の51.04%に過ぎなかった。
論文参考訳（メタデータ） (2023-03-07T12:03:58Z)
A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity [79.12003701981092]
8種類の共通NLPアプリケーションタスクをカバーする23のデータセットを用いてChatGPTの広範な技術的評価を行う。これらのデータセットと、新たに設計されたマルチモーダルデータセットに基づいて、ChatGPTのマルチタスク、マルチリンガル、マルチモーダルの側面を評価する。 ChatGPTの精度は平均63.41%で、論理的推論、非テキスト的推論、コモンセンス推論の10の異なる推論カテゴリで正確である。
論文参考訳（メタデータ） (2023-02-08T12:35:34Z)
Is ChatGPT a General-Purpose Natural Language Processing Task Solver? [113.22611481694825]
大規模言語モデル(LLM)は、さまざまな自然言語処理(NLP)タスクをゼロショットで実行できることを実証している。近年、ChatGPTのデビューは自然言語処理(NLP)コミュニティから大きな注目を集めている。 ChatGPTが多くのNLPタスクをゼロショットで実行できるジェネラリストモデルとして機能するかどうかはまだ分かっていない。
論文参考訳（メタデータ） (2023-02-08T09:44:51Z)
AdaPrompt: Adaptive Model Training for Prompt-based NLP [77.12071707955889]
PLMの継続事前学習のための外部データを適応的に検索するAdaPromptを提案する。 5つのNLPベンチマークの実験結果から、AdaPromptは数ショット設定で標準PLMよりも改善可能であることが示された。ゼロショット設定では、標準のプロンプトベースの手法を26.35%の相対誤差削減で上回ります。
論文参考訳（メタデータ） (2022-02-10T04:04:57Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。