KemenkeuGPT: Leveraging a Large Language Model on Indonesia's Government Financial Data and Regulations to Enhance Decision Making
- URL: http://arxiv.org/abs/2407.21459v1
- Date: Wed, 31 Jul 2024 09:16:33 GMT
- Title: KemenkeuGPT: Leveraging a Large Language Model on Indonesia's Government Financial Data and Regulations to Enhance Decision Making
- Authors: Gilang Fajar Febrian, Grazziela Figueredo,
- Abstract summary: This study investigates the potential of Large Language Models to address Indonesia's financial data and regulations.
This study undertakes an iterative process to develop KemenkeuGPT using the LangChain with Retrieval-Augmented Generation (RAG), prompt engineering and fine-tuning.
The model's accuracy improved from 35% to 61%, with correctness increasing from 48% to 64%.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data is crucial for evidence-based policymaking and enhancing public services, including those at the Ministry of Finance of the Republic of Indonesia. However, the complexity and dynamic nature of governmental financial data and regulations can hinder decision-making. This study investigates the potential of Large Language Models (LLMs) to address these challenges, focusing on Indonesia's financial data and regulations. While LLMs are effective in the financial sector, their use in the public sector in Indonesia is unexplored. This study undertakes an iterative process to develop KemenkeuGPT using the LangChain with Retrieval-Augmented Generation (RAG), prompt engineering and fine-tuning. The dataset from 2003 to 2023 was collected from the Ministry of Finance, Statistics Indonesia and the International Monetary Fund (IMF). Surveys and interviews with Ministry officials informed, enhanced and fine-tuned the model. We evaluated the model using human feedback, LLM-based evaluation and benchmarking. The model's accuracy improved from 35% to 61%, with correctness increasing from 48% to 64%. The Retrieval-Augmented Generation Assessment (RAGAS) framework showed that KemenkeuGPT achieved 44% correctness with 73% faithfulness, 40% precision and 60% recall, outperforming several other base models. An interview with an expert from the Ministry of Finance indicated that KemenkeuGPT has the potential to become an essential tool for decision-making. These results are expected to improve with continuous human feedback.
Related papers
- Evaluating Large Language Models on Financial Report Summarization: An Empirical Study [9.28042182186057]
We conduct a comparative study on three state-of-the-art Large Language Models (LLMs)
Our primary motivation is to explore how these models can be harnessed within finance, a field demanding precision, contextual relevance, and robustness against erroneous or misleading information.
We introduce an innovative evaluation framework that integrates both quantitative metrics (e.g., precision, recall) and qualitative analyses (e.g., contextual fit, consistency) to provide a holistic view of each model's output quality.
arXiv Detail & Related papers (2024-11-11T10:36:04Z) - DSBench: How Far Are Data Science Agents to Becoming Data Science Experts? [58.330879414174476]
We introduce DSBench, a benchmark designed to evaluate data science agents with realistic tasks.
This benchmark includes 466 data analysis tasks and 74 data modeling tasks, sourced from Eloquence and Kaggle competitions.
Our evaluation of state-of-the-art LLMs, LVLMs, and agents shows that they struggle with most tasks, with the best agent solving only 34.12% of data analysis tasks and achieving a 34.74% Relative Performance Gap (RPG)
arXiv Detail & Related papers (2024-09-12T02:08:00Z) - Efficacy of Large Language Models in Systematic Reviews [0.0]
This study investigates the effectiveness of Large Language Models (LLMs) in interpreting existing literature.
We compiled and hand-coded a database of 88 relevant papers published from March 2020 to May 2024.
We evaluated two current state-of-the-art LLMs, Meta AI's Llama 3 8B and OpenAI's GPT-4o, on the accuracy of their interpretations.
arXiv Detail & Related papers (2024-08-03T00:01:13Z) - Fine-Tuning Gemma-7B for Enhanced Sentiment Analysis of Financial News Headlines [4.198715347024138]
We use Natural Language Processing (NLP) and Large Language Models (LLM) to analyze sentiment from the perspective of retail investors.
We fine-tune several models, including distilbert-base-uncased, Llama, and gemma-7b, to evaluate their effectiveness in sentiment classification.
Our experiments demonstrate that the fine-tuned gemma-7b model outperforms others, achieving the highest precision, recall, and F1 score.
arXiv Detail & Related papers (2024-06-19T15:20:19Z) - DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data [65.5290035371111]
We introduce an approach to generate extensive Lean 4 proof data derived from high-school and undergraduate-level mathematical competition problems.
We fine-tune the DeepSeekMath 7B model on this synthetic dataset, which comprises 8 million formal statements with proofs.
Our model successfully proved 5 out of 148 problems in the Lean 4 Formalized International Mathematical Olympiad (FIMO) benchmark, while GPT-4 failed to prove any.
arXiv Detail & Related papers (2024-05-23T09:03:42Z) - ESGReveal: An LLM-based approach for extracting structured data from ESG
reports [5.467389155759699]
ESGReveal is an innovative method proposed for efficiently extracting and analyzing Environmental, Social, and Governance (ESG) data from corporate reports.
This approach utilizes Large Language Models (LLM) enhanced with Retrieval Augmented Generation (RAG) techniques.
Its efficacy was appraised using ESG reports from 166 companies across various sectors listed on the Hong Kong Stock Exchange in 2022.
arXiv Detail & Related papers (2023-12-25T06:44:32Z) - CSPRD: A Financial Policy Retrieval Dataset for Chinese Stock Market [61.59326951366202]
We propose a new task, policy retrieval, by introducing the Chinese Stock Policy Retrieval dataset (CSPRD)
CSPRD provides 700+ passages labeled by experienced experts with relevant articles from 10k+ entries in our collected Chinese policy corpus.
Our best performing baseline achieves 56.1% MRR@10, 28.5% NDCG@10, 37.5% Recall@10 and 80.6% Precision@10 on dev set.
arXiv Detail & Related papers (2023-09-08T15:40:54Z) - Biomedical image analysis competitions: The state of current
participation practice [143.52578599912326]
We designed a survey to shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis.
The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics.
Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures.
arXiv Detail & Related papers (2022-12-16T16:44:46Z) - Prompting GPT-3 To Be Reliable [117.23966502293796]
This work decomposes reliability into four facets: generalizability, fairness, calibration, and factuality.
We find that GPT-3 outperforms smaller-scale supervised models by large margins on all these facets.
arXiv Detail & Related papers (2022-10-17T14:52:39Z) - Trends in eBusiness and eGovernment [0.0]
The first chapter is a critical review and a case study in eBusiness, with special attention to the digital currencies resource.
The second chapter attempts to incorporate the UTAUT model with perceived risk theory to explore its impact on the intention to use m-government services.
The third chapter aims to assess the level of gender inclusivity in the municipal e-procurement processes in the City of Johannesburg.
arXiv Detail & Related papers (2021-04-02T17:53:17Z) - Explanations of Machine Learning predictions: a mandatory step for its
application to Operational Processes [61.20223338508952]
Credit Risk Modelling plays a paramount role.
Recent machine and deep learning techniques have been applied to the task.
We suggest to use LIME technique to tackle the explainability problem in this field.
arXiv Detail & Related papers (2020-12-30T10:27:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.