Fine-tuning Smaller Language Models for Question Answering over Financial Documents
- URL: http://arxiv.org/abs/2408.12337v1
- Date: Thu, 22 Aug 2024 12:23:29 GMT
- Title: Fine-tuning Smaller Language Models for Question Answering over Financial Documents
- Authors: Karmvir Singh Phogat, Sai Akhil Puranam, Sridhar Dasaratha, Chetan Harsha, Shashishekar Ramakrishna,
- Abstract summary: We focus on the challenge of answering questions that require multi-hop numerical reasoning over financial texts.
We assess the performance of several smaller models that have been fine-tuned to generate programs.
Our empirical analysis indicates that fine-tuning refines the student models ability to express and apply the required financial concepts.
- Score: 0.1747623282473278
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recent research has shown that smaller language models can acquire substantial reasoning abilities when fine-tuned with reasoning exemplars crafted by a significantly larger teacher model. We explore this paradigm for the financial domain, focusing on the challenge of answering questions that require multi-hop numerical reasoning over financial texts. We assess the performance of several smaller models that have been fine-tuned to generate programs that encode the required financial reasoning and calculations. Our findings demonstrate that these fine-tuned smaller models approach the performance of the teacher model. To provide a granular analysis of model performance, we propose an approach to investigate the specific student model capabilities that are enhanced by fine-tuning. Our empirical analysis indicates that fine-tuning refines the student models ability to express and apply the required financial concepts along with adapting the entity extraction for the specific data format. In addition, we hypothesize and demonstrate that comparable financial reasoning capability can be induced using relatively smaller datasets.
Related papers
- Learning-to-Defer for Extractive Question Answering [3.6787328174619254]
We introduce an adapted two-stage Learning-to-Defer mechanism that enhances decision-making by enabling selective deference to human experts or larger models without retraining language models in the context of question-answering.
Our results demonstrate that deferring a minimal number of queries allows the smaller model to achieve performance comparable to their larger counterparts while preserving computing efficiency.
arXiv Detail & Related papers (2024-10-21T08:21:00Z) - Large Language Model Adaptation for Financial Sentiment Analysis [2.0499240875882]
Generalist language models tend to fall short in tasks specifically tailored for finance.
Two foundation models with less than 1.5B parameters have been adapted using a wide range of strategies.
We show that small LLMs have comparable performance to larger scale models, while being more efficient in terms of parameters and data.
arXiv Detail & Related papers (2024-01-26T11:04:01Z) - One-Shot Open Affordance Learning with Foundation Models [54.15857111929812]
We introduce One-shot Open Affordance Learning (OOAL), where a model is trained with just one example per base object category.
We propose a vision-language framework with simple and effective designs that boost the alignment between visual features and affordance text embeddings.
Experiments on two affordance segmentation benchmarks show that the proposed method outperforms state-of-the-art models with less than 1% of the full training data.
arXiv Detail & Related papers (2023-11-29T16:23:06Z) - RAVEN: In-Context Learning with Retrieval-Augmented Encoder-Decoder Language Models [57.12888828853409]
RAVEN is a model that combines retrieval-augmented masked language modeling and prefix language modeling.
Fusion-in-Context Learning enables the model to leverage more in-context examples without requiring additional training.
Our work underscores the potential of retrieval-augmented encoder-decoder language models for in-context learning.
arXiv Detail & Related papers (2023-08-15T17:59:18Z) - Are ChatGPT and GPT-4 General-Purpose Solvers for Financial Text
Analytics? A Study on Several Typical Tasks [36.84636748560657]
Large language models such as ChatGPT and GPT-4 have shown exceptional capabilities of generalist models.
How effective are such models in the financial domain?
arXiv Detail & Related papers (2023-05-10T03:13:54Z) - Minimal Value-Equivalent Partial Models for Scalable and Robust Planning
in Lifelong Reinforcement Learning [56.50123642237106]
Common practice in model-based reinforcement learning is to learn models that model every aspect of the agent's environment.
We argue that such models are not particularly well-suited for performing scalable and robust planning in lifelong reinforcement learning scenarios.
We propose new kinds of models that only model the relevant aspects of the environment, which we call "minimal value-minimal partial models"
arXiv Detail & Related papers (2023-01-24T16:40:01Z) - Graph-Regularized Tensor Regression: A Domain-Aware Framework for
Interpretable Multi-Way Financial Modelling [23.030263841031633]
We develop a novel Graph-Regularized Regression (GRTR) framework, whereby knowledge about cross-asset relations is incorporated into the model in the form of a graph Laplacian matrix.
By virtue of tensor algebra, the proposed framework is shown to be fully interpretable, both coefficient-wise and dimension-wise.
The GRTR model is validated in a multi-way financial forecasting setting and is shown to achieve improved performance at reduced computational costs.
arXiv Detail & Related papers (2022-10-26T13:39:08Z) - Feeding What You Need by Understanding What You Learned [54.400455868448695]
Machine Reading (MRC) reveals the ability to understand a given text passage and answer questions based on it.
Existing research works in MRC rely heavily on large-size models and corpus to improve the performance evaluated by metrics such as Exact Match.
We argue that a deep understanding of model capabilities and data properties can help us feed a model with appropriate training data.
arXiv Detail & Related papers (2022-03-05T14:15:59Z) - Model-agnostic multi-objective approach for the evolutionary discovery
of mathematical models [55.41644538483948]
In modern data science, it is more interesting to understand the properties of the model, which parts could be replaced to obtain better results.
We use multi-objective evolutionary optimization for composite data-driven model learning to obtain the algorithm's desired properties.
arXiv Detail & Related papers (2021-07-07T11:17:09Z) - Plausible Counterfactuals: Auditing Deep Learning Classifiers with
Realistic Adversarial Examples [84.8370546614042]
Black-box nature of Deep Learning models has posed unanswered questions about what they learn from data.
Generative Adversarial Network (GAN) and multi-objectives are used to furnish a plausible attack to the audited model.
Its utility is showcased within a human face classification task, unveiling the enormous potential of the proposed framework.
arXiv Detail & Related papers (2020-03-25T11:08:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.