Enhancing Financial Domain Adaptation of Language Models via Model Augmentation
- URL: http://arxiv.org/abs/2411.09249v1
- Date: Thu, 14 Nov 2024 07:28:09 GMT
- Title: Enhancing Financial Domain Adaptation of Language Models via Model Augmentation
- Authors: Kota Tanabe, Masanori Hirano, Kazuki Matoya, Kentaro Imajo, Hiroki Sakaji, Itsuki Noda,
- Abstract summary: This study demonstrates the effectiveness of Composition to Augment Language Models (CALM) in adapting to the financial domain.
We developed a CALM to enhance the financial performance of an LLM with strong response capabilities.
- Score: 2.9960693856871545
- License:
- Abstract: The domain adaptation of language models, including large language models (LLMs), has become increasingly important as the use of such models continues to expand. This study demonstrates the effectiveness of Composition to Augment Language Models (CALM) in adapting to the financial domain. CALM is a model to extend the capabilities of existing models by introducing cross-attention between two LLMs with different functions. In our experiments, we developed a CALM to enhance the financial performance of an LLM with strong response capabilities by leveraging a financial-specialized LLM. Notably, the CALM was trained using a financial dataset different from the one used to train the financial-specialized LLM, confirming CALM's ability to adapt to various datasets. The models were evaluated through quantitative Japanese financial benchmarks and qualitative response comparisons, demonstrating that CALM enables superior responses with higher scores than the original models and baselines. Additionally, comparative experiments on connection points revealed that connecting the middle layers of the models is most effective in facilitating adaptation to the financial domain. These findings confirm that CALM is a practical approach for adapting LLMs to the financial domain.
Related papers
- FLAG-Trader: Fusion LLM-Agent with Gradient-based Reinforcement Learning for Financial Trading [28.57263158928989]
Large language models (LLMs) fine-tuned on multimodal financial data have demonstrated impressive reasoning capabilities.
We propose textscFLAG-Trader, a unified architecture integrating linguistic processing (via LLMs) with gradient-driven reinforcement learning (RL) policy optimization.
arXiv Detail & Related papers (2025-02-17T04:45:53Z) - LLM-Lasso: A Robust Framework for Domain-Informed Feature Selection and Regularization [59.75242204923353]
We introduce LLM-Lasso, a framework that leverages large language models (LLMs) to guide feature selection in Lasso regression.
LLMs generate penalty factors for each feature, which are converted into weights for the Lasso penalty using a simple, tunable model.
Features identified as more relevant by the LLM receive lower penalties, increasing their likelihood of being retained in the final model.
arXiv Detail & Related papers (2025-02-15T02:55:22Z) - Adaptive Pruning for Large Language Models with Structural Importance Awareness [66.2690963378878]
Large language models (LLMs) have significantly improved language understanding and generation capabilities.
LLMs are difficult to deploy on resource-constrained edge devices due to their high computational and storage resource demands.
We propose structurally-aware adaptive pruning (SAAP) to significantly reduce the computational and memory costs while maintaining model performance.
arXiv Detail & Related papers (2024-12-19T18:08:04Z) - A Comparative Analysis of Instruction Fine-Tuning LLMs for Financial Text Classification [0.8192907805418583]
Large Language Models (LLMs) have demonstrated impressive capabilities across diverse Natural Language Processing (NLP) tasks.
This study investigates the efficacy of instruction fine-tuning to enhance their performance in financial text classification tasks.
arXiv Detail & Related papers (2024-11-04T18:06:36Z) - LLM-KT: A Versatile Framework for Knowledge Transfer from Large Language Models to Collaborative Filtering [0.07793154724386657]
We present a flexible framework designed to enhance collaborative filtering (CF) models by seamlessly integrating LLM-generated features.
Our framework injects these features into an intermediate layer of any CF model, allowing the model to reconstruct and leverage the embeddings internally.
Our framework is built for easy integration and modification, providing researchers and developers with a powerful tool for extending CF model capabilities.
arXiv Detail & Related papers (2024-11-01T13:09:30Z) - Characterizing Truthfulness in Large Language Model Generations with
Local Intrinsic Dimension [63.330262740414646]
We study how to characterize and predict the truthfulness of texts generated from large language models (LLMs)
We suggest investigating internal activations and quantifying LLM's truthfulness using the local intrinsic dimension (LID) of model activations.
arXiv Detail & Related papers (2024-02-28T04:56:21Z) - Large Language Model Adaptation for Financial Sentiment Analysis [2.0499240875882]
Generalist language models tend to fall short in tasks specifically tailored for finance.
Two foundation models with less than 1.5B parameters have been adapted using a wide range of strategies.
We show that small LLMs have comparable performance to larger scale models, while being more efficient in terms of parameters and data.
arXiv Detail & Related papers (2024-01-26T11:04:01Z) - Knowledge Fusion of Large Language Models [73.28202188100646]
This paper introduces the notion of knowledge fusion for large language models (LLMs)
We externalize their collective knowledge and unique strengths, thereby elevating the capabilities of the target model beyond those of any individual source LLM.
Our findings confirm that the fusion of LLMs can improve the performance of the target model across a range of capabilities such as reasoning, commonsense, and code generation.
arXiv Detail & Related papers (2024-01-19T05:02:46Z) - LLM Augmented LLMs: Expanding Capabilities through Composition [56.40953749310957]
CALM -- Composition to Augment Language Models -- introduces cross-attention between models to compose their representations and enable new capabilities.
We illustrate that augmenting PaLM2-S with a smaller model trained on low-resource languages results in an absolute improvement of up to 13% on tasks like translation into English.
When PaLM2-S is augmented with a code-specific model, we see a relative improvement of 40% over the base model for code generation and explanation tasks.
arXiv Detail & Related papers (2024-01-04T18:53:01Z) - Enhancing Financial Sentiment Analysis via Retrieval Augmented Large
Language Models [11.154814189699735]
Large Language Models (LLMs) pre-trained on extensive corpora have demonstrated superior performance across various NLP tasks.
We introduce a retrieval-augmented LLMs framework for financial sentiment analysis.
Our approach achieves 15% to 48% performance gain in accuracy and F1 score.
arXiv Detail & Related papers (2023-10-06T05:40:23Z) - CATfOOD: Counterfactual Augmented Training for Improving Out-of-Domain
Performance and Calibration [59.48235003469116]
We show that data augmentation consistently enhances OOD performance.
We also show that CF augmented models which are easier to calibrate also exhibit much lower entropy when assigning importance.
arXiv Detail & Related papers (2023-09-14T16:16:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.