Fugu-MT 論文翻訳(概要): Mathematical Capabilities of ChatGPT

論文の概要: Mathematical Capabilities of ChatGPT

arxiv url: http://arxiv.org/abs/2301.13867v1
Date: Tue, 31 Jan 2023 18:59:03 GMT
ステータス: 翻訳完了
システム内更新日: 2023-02-01 15:17:33.885120
Title: Mathematical Capabilities of ChatGPT
Title（参考訳）: ChatGPTの数学的機能
Authors: Simon Frieder, Luca Pinchetti, Ryan-Rhys Griffiths, Tommaso Salvatori, Thomas Lukasiewicz, Philipp Christian Petersen, Alexis Chevalier, Julius Berner
Abstract要約: 我々は、ChatGPTの数学的能力について、公開データセットや手作りデータセットで検証し、その性能をMinervaのような数学的コーパスで訓練された他のモデルと比較することで検討する。また,ChatGPTの数学能力は,平均的な数学の大学院生の数学能力よりも有意に劣っていると結論づけた。
参考スコア（独自算出の注目度）: 35.71603158908465
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We investigate the mathematical capabilities of ChatGPT by testing it on publicly available datasets, as well as hand-crafted ones, and measuring its performance against other models trained on a mathematical corpus, such as Minerva. We also test whether ChatGPT can be a useful assistant to professional mathematicians by emulating various use cases that come up in the daily professional activities of mathematicians (question answering, theorem searching). In contrast to formal mathematics, where large databases of formal proofs are available (e.g., the Lean Mathematical Library), current datasets of natural-language mathematics, used to benchmark language models, only cover elementary mathematics. We address this issue by introducing a new dataset: GHOSTS. It is the first natural-language dataset made and curated by working researchers in mathematics that (1) aims to cover graduate-level mathematics and (2) provides a holistic overview of the mathematical capabilities of language models. We benchmark ChatGPT on GHOSTS and evaluate performance against fine-grained criteria. We make this new dataset publicly available to assist a community-driven comparison of ChatGPT with (future) large language models in terms of advanced mathematical comprehension. We conclude that contrary to many positive reports in the media (a potential case of selection bias), ChatGPT's mathematical abilities are significantly below those of an average mathematics graduate student. Our results show that ChatGPT often understands the question but fails to provide correct solutions. Hence, if your goal is to use it to pass a university exam, you would be better off copying from your average peer!
Abstract（参考訳）: 我々は、ChatGPTの数学的能力について、公開データセットと手作りデータセットで検証し、その性能をMinervaのような数学的コーパスで訓練された他のモデルと比較して測定する。また,ChatGPTが数学者の日常的な職業活動に現れる様々なユースケース(質問応答,定理探索)をエミュレートすることにより,プロの数学者にとって有用なアシスタントになるかどうかを検証した。形式数学とは対照的に、公式証明の大規模なデータベース(例えば、Lean Mathematical Library)は、自然言語数学の現在のデータセットであり、言語モデルのベンチマークに使われる。我々は新しいデータセット、GHOSTSを導入することでこの問題に対処する。このデータセットは,(1)大学院レベルの数学を対象とし,(2)言語モデルの数学的能力に関する総合的な概要を提供する数学研究者による最初の自然言語データセットである。 GHOSTSでChatGPTをベンチマークし、粒度の細かい基準に対して性能を評価する。より高度な数学的理解の観点から,ChatGPTと(将来の)大規模言語モデルの比較を支援するために,この新しいデータセットを一般公開する。メディアにおける多くの肯定的な報告(選択バイアスの可能性)とは対照的に、ChatGPTの数学的能力は平均的な数学の大学院生のそれよりもかなり低い。以上の結果から,ChatGPTは解答に失敗することが多い。ですから,大学試験に合格するためにそれを使うという目標ならば,平均的な仲間からコピーした方がよいでしょう!

関連論文リスト

Data for Mathematical Copilots: Better Ways of Presenting Proofs for Machine Learning [85.635988711588]
我々は,大規模言語モデルの能力向上には,数学的データセットの設計におけるパラダイムシフトが必要であると論じる。 1949年にG. P'olyaが導入した「動機付き証明」の概念は、より良い証明学習信号を提供するデータセットの青写真として機能する。数学データセットに特化して設計されたアンケートでは、クリエーターにデータセットを含めるよう促します。
論文参考訳（メタデータ） (2024-12-19T18:55:17Z)
MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark [82.64129627675123]
MathBenchは、大規模言語モデルの数学的能力を厳格に評価する新しいベンチマークである。 MathBenchは幅広い数学の分野にまたがっており、理論的な理解と実践的な問題解決のスキルの両方を詳細に評価している。
論文参考訳（メタデータ） (2024-05-20T17:52:29Z)
MathScale: Scaling Instruction Tuning for Mathematical Reasoning [70.89605383298331]
大規模言語モデル(LLM)は問題解決において顕著な能力を示した。しかし、数学的な問題を解く能力は依然として不十分である。高品質な数学的推論データを作成するためのシンプルでスケーラブルな方法であるMathScaleを提案する。
論文参考訳（メタデータ） (2024-03-05T11:42:59Z)
MATHSENSEI: A Tool-Augmented Large Language Model for Mathematical Reasoning [2.9104279358536647]
数学的推論のためのツール強化された大規模言語モデルであるMathSenseiを提案する。ツールの補完的な利点として、知識検索(Bing Web Search)、プログラムジェネレータ+エグゼキュータ(Python)、記号方程式ソルバ(Wolfram-Alpha API)について検討する。
論文参考訳（メタデータ） (2024-02-27T05:50:35Z)
InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning [98.53491178426492]
InternLM2から事前学習を継続するILMs InternLM-Mathをオープンソースとして公開する。我々は、連鎖推論、報酬モデリング、形式推論、データ拡張、コードインタプリタを、統一されたSeq2seqフォーマットで統一する。我々の事前学習モデルは、微調整なしでMiniF2Fテストセットで30.3を達成する。
論文参考訳（メタデータ） (2024-02-09T11:22:08Z)
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning [52.97768001837269]
本稿では,オープンソース言語モデルを微調整する手法を提案する。本稿では,問題のある新しい,高品質なデータセットを生成する手法とそのコードベースソリューションを提案する。このアプローチは、問題の解決にコードベースのソリューションを生成することができるモデルのファミリーであるMathCoderモデルを生成する。
論文参考訳（メタデータ） (2023-10-05T17:52:09Z)
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct [128.89645483139236]
本稿では,Llama-2の数学的推論能力を向上するWizardMathを提案する。 GSM8kではChatGPT-3.5, Claude Instant-1, PaLM-2, Minervaを上回り, 同時にMATHでは Text-davinci, PaLM-1, GPT-3 を上回ります。
論文参考訳（メタデータ） (2023-08-18T14:23:21Z)
Math Agents: Computational Infrastructure, Mathematical Embedding, and Genomics [0.0]
人間-AIチャット以外にも、大規模言語モデル(LLM)はプログラミング、アルゴリズム発見、定理証明に現れている。本研究は「ムーアの数学法則」の新たなエントリとして数学エージェントと数学的埋め込みを紹介する。プロジェクトは、情報システム生物学の老朽化問題に対処するために、数学エージェントと数学的埋め込みを使用することを目的としている。
論文参考訳（メタデータ） (2023-07-04T20:16:32Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。