CryptoGPT: a 7B model rivaling GPT-4 in the task of analyzing and classifying real-time financial news
- URL: http://arxiv.org/abs/2406.14039v1
- Date: Thu, 20 Jun 2024 06:59:46 GMT
- Title: CryptoGPT: a 7B model rivaling GPT-4 in the task of analyzing and classifying real-time financial news
- Authors: Ying Zhang, Matthieu Petit Guillaume, Aurélien Krauth, Manel Labidi,
- Abstract summary: We present a method aimed at refining a dedicated LLM of reasonable quality with limited resources in an industrial setting via CryptoGPT.
This model allows not only for the classification of financial information but also for providing comprehensive analysis.
- Score: 3.8447306272420816
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: CryptoGPT: a 7B model competing with GPT-4 in a specific task -- The Impact of Automatic Annotation and Strategic Fine-Tuning via QLoRAIn this article, we present a method aimed at refining a dedicated LLM of reasonable quality with limited resources in an industrial setting via CryptoGPT. It is an LLM designed for financial news analysis for the cryptocurrency market in real-time. This project was launched in an industrial context. This model allows not only for the classification of financial information but also for providing comprehensive analysis. We refined different LLMs of the same size such as Mistral-7B and LLama-7B using semi-automatic annotation and compared them with various LLMs such as GPT-3.5 and GPT-4. Our goal is to find a balance among several needs: 1. Protecting data (by avoiding their transfer to external servers), 2. Limiting annotation cost and time, 3. Controlling the model's size (to manage deployment costs), and 4. Maintaining better analysis quality.
Related papers
- CryptoBench: A Dynamic Benchmark for Expert-Level Evaluation of LLM Agents in Cryptocurrency [60.83660377169452]
This paper introduces CryptoBench, the first expert-curated, dynamic benchmark designed to rigorously evaluate the real-world capabilities of Large Language Model (LLM) agents.<n>Unlike general-purpose agent benchmarks for search and prediction, professional crypto analysis presents specific challenges.
arXiv Detail & Related papers (2025-11-29T09:52:34Z) - Crossing Domains without Labels: Distant Supervision for Term Extraction [41.886337761732456]
Current state-of-the-art methods require expensive human annotation and struggle with domain transfer.<n>We introduce a benchmark spanning seven diverse domains, enabling performance evaluation at both the document- and corpus-levels.<n>Our approach exceeds previous approaches on 5/7 domains with an average improvement of 10 percentage points.
arXiv Detail & Related papers (2025-10-08T10:02:40Z) - Cut Costs, Not Accuracy: LLM-Powered Data Processing with Guarantees [10.940593916080276]
Large Language Models (LLMs) are being increasingly used as a building block in data systems to process large text datasets.<n>To avoid high costs, more affordable but lower quality LLMs can be used to process records.<n>We present BARGAIN, a method that judiciously uses affordable LLMs in data processing to significantly reduce cost.
arXiv Detail & Related papers (2025-09-02T23:41:50Z) - Your AI, Not Your View: The Bias of LLMs in Investment Analysis [55.328782443604986]
Large Language Models (LLMs) face frequent knowledge conflicts due to discrepancies between pre-trained parametric knowledge and real-time market data.<n>This paper offers the first quantitative analysis of confirmation bias in LLM-based investment analysis.<n>We observe a consistent preference for large-cap stocks and contrarian strategies across most models.
arXiv Detail & Related papers (2025-07-28T16:09:38Z) - STEER-ME: Assessing the Microeconomic Reasoning of Large Language Models [8.60556939977361]
We develop a benchmark for evaluating large language models (LLM) for microeconomic reasoning.
We focus on the logic of supply and demand, each grounded in up to $10$ domains, $5$ perspectives, and $3$ types.
We demonstrate the usefulness of our benchmark via a case study on $27$ LLMs, ranging from small open-source models to the current state of the art.
arXiv Detail & Related papers (2025-02-18T18:42:09Z) - Multi-stage Large Language Model Pipelines Can Outperform GPT-4o in Relevance Assessment [6.947361774195549]
We propose a modular classification pipeline that divides the relevance assessment task into multiple stages.
One of our approaches showed an 18.4% Krippendorff's $alpha$ accuracy increase over OpenAI's GPT-4o mini.
arXiv Detail & Related papers (2025-01-24T07:33:39Z) - The Dual-use Dilemma in LLMs: Do Empowering Ethical Capacities Make a Degraded Utility? [54.18519360412294]
Large Language Models (LLMs) must balance between rejecting harmful requests for safety and accommodating legitimate ones for utility.
This paper presents a Direct Preference Optimization (DPO) based alignment framework that achieves better overall performance.
We analyze experimental results obtained from testing DeepSeek-R1 on our benchmark and reveal the critical ethical concerns raised by this highly acclaimed model.
arXiv Detail & Related papers (2025-01-20T06:35:01Z) - LLM2: Let Large Language Models Harness System 2 Reasoning [65.89293674479907]
Large language models (LLMs) have exhibited impressive capabilities across a myriad of tasks, yet they occasionally yield undesirable outputs.
We introduce LLM2, a novel framework that combines an LLM with a process-based verifier.
LLMs2 is responsible for generating plausible candidates, while the verifier provides timely process-based feedback to distinguish desirable and undesirable outputs.
arXiv Detail & Related papers (2024-12-29T06:32:36Z) - Automatic Evaluation for Text-to-image Generation: Task-decomposed Framework, Distilled Training, and Meta-evaluation Benchmark [62.58869921806019]
We propose a task decomposition evaluation framework based on GPT-4o to automatically construct a new training dataset.
We design innovative training strategies to effectively distill GPT-4o's evaluation capabilities into a 7B open-source MLLM, MiniCPM-V-2.6.
Experimental results demonstrate that our distilled open-source MLLM significantly outperforms the current state-of-the-art GPT-4o-base baseline.
arXiv Detail & Related papers (2024-11-23T08:06:06Z) - BreakGPT: Leveraging Large Language Models for Predicting Asset Price Surges [55.2480439325792]
This paper introduces BreakGPT, a novel large language model (LLM) architecture adapted specifically for time series forecasting and the prediction of sharp upward movements in asset prices.
We showcase BreakGPT as a promising solution for financial forecasting with minimal training and as a strong competitor for capturing both local and global temporal dependencies.
arXiv Detail & Related papers (2024-11-09T05:40:32Z) - KodeXv0.1: A Family of State-of-the-Art Financial Large Language Models [41.94295877935867]
KodeXv0.1 is a family of large language models that outclass GPT-4 in financial question answering.
We process a large number of publicly available financial documents such as earnings calls and business reports.
arXiv Detail & Related papers (2024-09-13T16:43:08Z) - See What LLMs Cannot Answer: A Self-Challenge Framework for Uncovering LLM Weaknesses [51.975495361024606]
We propose a Self-Challenge evaluation framework with human-in-the-loop.
Starting from seed instances that GPT-4 fails to answer, we prompt GPT-4 to summarize error patterns that can be used to generate new instances.
We then build a benchmark, SC-G4, consisting of 1,835 instances generated by GPT-4 using these patterns, with human-annotated gold responses.
arXiv Detail & Related papers (2024-08-16T19:01:52Z) - Harmonic LLMs are Trustworthy [3.8119386967826294]
We introduce an intuitive method to test the robustness of any black-box LLM in real-time via its local deviation from harmoniticity, denoted as $gamma$.
We measure $gamma$ in 10 popular LLMs across thousands of queries in three objective domains: WebQA, ProgrammingQA, and TruthfulQA.
arXiv Detail & Related papers (2024-04-30T17:00:32Z) - FinLlama: Financial Sentiment Classification for Algorithmic Trading Applications [2.2661367844871854]
Large Language Models (LLMs) can be used in this context, but they are not finance-specific and tend to require significant computational resources.
We introduce a novel approach based on the Llama 2 7B foundational model, in order to benefit from its generative nature and comprehensive language manipulation.
This is achieved by fine-tuning the Llama2 7B model on a small portion of supervised financial sentiment analysis data.
arXiv Detail & Related papers (2024-03-18T22:11:00Z) - How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments [83.78240828340681]
We introduce GAMA($gamma$)-Bench, a new framework for evaluating Large Language Models' Gaming Ability in Multi-Agent environments.
$gamma$-Bench includes eight classical game theory scenarios and a dynamic scoring scheme specially designed to assess LLMs' performance.
Results indicate GPT-3.5 demonstrates strong robustness but limited generalizability, which can be enhanced using methods like Chain-of-Thought.
arXiv Detail & Related papers (2024-03-18T14:04:47Z) - TAT-LLM: A Specialized Language Model for Discrete Reasoning over Tabular and Textual Data [73.29220562541204]
We consider harnessing the amazing power of language models (LLMs) to solve our task.
We develop a TAT-LLM language model by fine-tuning LLaMA 2 with the training data generated automatically from existing expert-annotated datasets.
arXiv Detail & Related papers (2024-01-24T04:28:50Z) - SCALE: Synergized Collaboration of Asymmetric Language Translation
Engines [105.8983433641208]
We introduce a collaborative framework that connects compact Specialized Translation Models (STMs) and general-purpose Large Language Models (LLMs) as one unified translation engine.
By introducing translation from STM into the triplet in-context demonstrations, SCALE unlocks refinement and pivoting ability of LLM.
Our experiments show that SCALE significantly outperforms both few-shot LLMs (GPT-4) and specialized models (NLLB) in challenging low-resource settings.
arXiv Detail & Related papers (2023-09-29T08:46:38Z) - GPT-Fathom: Benchmarking Large Language Models to Decipher the Evolutionary Path towards GPT-4 and Beyond [29.778018058541676]
GPT-Fathom is an open-source and reproducible evaluation suite for large language models (LLMs) built on top of OpenAI Evals.
We evaluate 10+ leading LLMs as well as OpenAI's legacy models on 20+ curated benchmarks across 7 capability categories, all aligned under settings.
arXiv Detail & Related papers (2023-09-28T16:43:35Z) - Can ChatGPT Forecast Stock Price Movements? Return Predictability and Large Language Models [48.87381259980254]
We document the capability of large language models (LLMs) like ChatGPT to predict stock market reactions from news headlines without direct financial training.<n>Using post-knowledge-cutoff headlines, GPT-4 captures initial market responses, achieving approximately 90% portfolio-day hit rates for the non-tradable initial reaction.
arXiv Detail & Related papers (2023-04-15T19:22:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.