D\'olares or Dollars? Unraveling the Bilingual Prowess of Financial LLMs
Between Spanish and English
- URL: http://arxiv.org/abs/2402.07405v1
- Date: Mon, 12 Feb 2024 04:50:31 GMT
- Title: D\'olares or Dollars? Unraveling the Bilingual Prowess of Financial LLMs
Between Spanish and English
- Authors: Xiao Zhang, Ruoyu Xiang, Chenhan Yuan, Duanyu Feng, Weiguang Han,
Alejandro Lopez-Lira, Xiao-Yang Liu, Sophia Ananiadou, Min Peng, Jimin Huang,
Qianqian Xie
- Abstract summary: Tois'on de Oro is the first framework that establishes instruction datasets, finetuned LLMs, and evaluation benchmark for financial LLMs in Spanish joint with English.
We construct a rigorously curated bilingual instruction dataset including over 144K Spanish and English samples from 15 datasets covering 7 tasks.
We evaluate our model and existing LLMs using FLARE-ES, the first comprehensive bilingual evaluation benchmark with 21 datasets covering 9 tasks.
- Score: 67.48541936784501
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite Spanish's pivotal role in the global finance industry, a pronounced
gap exists in Spanish financial natural language processing (NLP) and
application studies compared to English, especially in the era of large
language models (LLMs). To bridge this gap, we unveil Tois\'on de Oro, the
first bilingual framework that establishes instruction datasets, finetuned
LLMs, and evaluation benchmark for financial LLMs in Spanish joint with
English. We construct a rigorously curated bilingual instruction dataset
including over 144K Spanish and English samples from 15 datasets covering 7
tasks. Harnessing this, we introduce FinMA-ES, an LLM designed for bilingual
financial applications. We evaluate our model and existing LLMs using FLARE-ES,
the first comprehensive bilingual evaluation benchmark with 21 datasets
covering 9 tasks. The FLARE-ES benchmark results reveal a significant
multilingual performance gap and bias in existing LLMs. FinMA-ES models surpass
SOTA LLMs such as GPT-4 in Spanish financial tasks, due to strategic
instruction tuning and leveraging data from diverse linguistic resources,
highlighting the positive impact of cross-linguistic transfer. All our
datasets, models, and benchmarks have been released.
Related papers
- Getting More from Less: Large Language Models are Good Spontaneous Multilingual Learners [67.85635044939836]
Large Language Models (LLMs) have shown impressive language capabilities.
In this work, we investigate the spontaneous multilingual alignment improvement of LLMs.
We find that LLMs instruction-tuned on the question translation data (i.e. without annotated answers) are able to encourage the alignment between English and a wide range of languages.
arXiv Detail & Related papers (2024-05-22T16:46:19Z) - No Language is an Island: Unifying Chinese and English in Financial Large Language Models, Instruction Data, and Benchmarks [73.11935193630823]
ICE-PIXIU uniquely integrates a spectrum of Chinese tasks, alongside translated and original English datasets.
It provides unrestricted access to diverse model variants, a compilation of diverse cross-lingual and multi-modal instruction data, and an evaluation benchmark with expert annotations.
arXiv Detail & Related papers (2024-03-10T16:22:20Z) - Introducing Bode: A Fine-Tuned Large Language Model for Portuguese
Prompt-Based Task [1.158680734110387]
This work proposes a fine-tuned LLaMA 2-based model for Portuguese prompts named Bode.
We evaluate the performance of this model in classification tasks using the zero-shot approach with in-context learning.
arXiv Detail & Related papers (2024-01-05T17:15:01Z) - Zero-Shot Cross-Lingual Reranking with Large Language Models for
Low-Resource Languages [51.301942056881146]
We investigate how large language models (LLMs) function as rerankers in cross-lingual information retrieval systems for African languages.
Our implementation covers English and four African languages (Hausa, Somali, Swahili, and Yoruba)
We examine cross-lingual reranking with queries in English and passages in the African languages.
arXiv Detail & Related papers (2023-12-26T18:38:54Z) - MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks [12.665447518524187]
This study aims to perform a thorough evaluation of the non-English capabilities of SoTA LLMs by comparing them on the same set of multilingual datasets.
Our benchmark comprises 22 datasets covering 83 languages, including low-resource African languages.
We also perform a study on data contamination and find that several models are likely to be contaminated with multilingual evaluation benchmarks.
arXiv Detail & Related papers (2023-11-13T16:45:37Z) - Extrapolating Large Language Models to Non-English by Aligning Languages [109.09051737966178]
Existing large language models show disparate capability across different languages.
In this paper, we empower pre-trained LLMs on non-English languages by building semantic alignment across languages.
arXiv Detail & Related papers (2023-08-09T13:32:06Z) - Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis [103.89753784762445]
Large language models (LLMs) have demonstrated remarkable potential in handling multilingual machine translation (MMT)
This paper systematically investigates the advantages and challenges of LLMs for MMT.
We thoroughly evaluate eight popular LLMs, including ChatGPT and GPT-4.
arXiv Detail & Related papers (2023-04-10T15:51:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.