D\'olares or Dollars? Unraveling the Bilingual Prowess of Financial LLMs
Between Spanish and English
- URL: http://arxiv.org/abs/2402.07405v1
- Date: Mon, 12 Feb 2024 04:50:31 GMT
- Title: D\'olares or Dollars? Unraveling the Bilingual Prowess of Financial LLMs
Between Spanish and English
- Authors: Xiao Zhang, Ruoyu Xiang, Chenhan Yuan, Duanyu Feng, Weiguang Han,
Alejandro Lopez-Lira, Xiao-Yang Liu, Sophia Ananiadou, Min Peng, Jimin Huang,
Qianqian Xie
- Abstract summary: Tois'on de Oro is the first framework that establishes instruction datasets, finetuned LLMs, and evaluation benchmark for financial LLMs in Spanish joint with English.
We construct a rigorously curated bilingual instruction dataset including over 144K Spanish and English samples from 15 datasets covering 7 tasks.
We evaluate our model and existing LLMs using FLARE-ES, the first comprehensive bilingual evaluation benchmark with 21 datasets covering 9 tasks.
- Score: 67.48541936784501
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite Spanish's pivotal role in the global finance industry, a pronounced
gap exists in Spanish financial natural language processing (NLP) and
application studies compared to English, especially in the era of large
language models (LLMs). To bridge this gap, we unveil Tois\'on de Oro, the
first bilingual framework that establishes instruction datasets, finetuned
LLMs, and evaluation benchmark for financial LLMs in Spanish joint with
English. We construct a rigorously curated bilingual instruction dataset
including over 144K Spanish and English samples from 15 datasets covering 7
tasks. Harnessing this, we introduce FinMA-ES, an LLM designed for bilingual
financial applications. We evaluate our model and existing LLMs using FLARE-ES,
the first comprehensive bilingual evaluation benchmark with 21 datasets
covering 9 tasks. The FLARE-ES benchmark results reveal a significant
multilingual performance gap and bias in existing LLMs. FinMA-ES models surpass
SOTA LLMs such as GPT-4 in Spanish financial tasks, due to strategic
instruction tuning and leveraging data from diverse linguistic resources,
highlighting the positive impact of cross-linguistic transfer. All our
datasets, models, and benchmarks have been released.
Related papers
- Teuken-7B-Base & Teuken-7B-Instruct: Towards European LLMs [29.595342315049106]
We present two multilingual LLMs designed to embrace Europe's linguistic diversity by supporting all 24 official languages of the European Union.
We detail the models' development principles, i.e., data composition, tokenizer optimization, and training methodologies.
arXiv Detail & Related papers (2024-09-30T16:05:38Z) - EuroLLM: Multilingual Language Models for Europe [76.89545643715368]
We introduce the EuroLLM project, aimed at developing a suite of open-weight multilingual LLMs.
We outline the progress made to date, detailing our data collection and filtering process.
We report our performance on multilingual general benchmarks and machine translation.
arXiv Detail & Related papers (2024-09-24T16:51:36Z) - A Survey of Large Language Models for European Languages [4.328283741894074]
Large Language Models (LLMs) have gained significant attention due to their high performance on a wide range of natural language tasks.
We present an overview of LLM families, including LLaMA, PaLM, GPT, and MoE.
We provide a comprehensive summary of common monolingual and multilingual datasets used for pretraining large language models.
arXiv Detail & Related papers (2024-08-27T13:10:05Z) - No Language is an Island: Unifying Chinese and English in Financial Large Language Models, Instruction Data, and Benchmarks [75.29561463156635]
ICE-PIXIU uniquely integrates a spectrum of Chinese tasks, alongside translated and original English datasets.
It provides unrestricted access to diverse model variants, a compilation of diverse cross-lingual and multi-modal instruction data, and an evaluation benchmark with expert annotations.
arXiv Detail & Related papers (2024-03-10T16:22:20Z) - Zero-Shot Cross-Lingual Reranking with Large Language Models for
Low-Resource Languages [51.301942056881146]
We investigate how large language models (LLMs) function as rerankers in cross-lingual information retrieval systems for African languages.
Our implementation covers English and four African languages (Hausa, Somali, Swahili, and Yoruba)
We examine cross-lingual reranking with queries in English and passages in the African languages.
arXiv Detail & Related papers (2023-12-26T18:38:54Z) - MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks [12.665447518524187]
This study aims to perform a thorough evaluation of the non-English capabilities of SoTA LLMs by comparing them on the same set of multilingual datasets.
Our benchmark comprises 22 datasets covering 83 languages, including low-resource African languages.
We also perform a study on data contamination and find that several models are likely to be contaminated with multilingual evaluation benchmarks.
arXiv Detail & Related papers (2023-11-13T16:45:37Z) - Extrapolating Large Language Models to Non-English by Aligning Languages [109.09051737966178]
Existing large language models show disparate capability across different languages.
In this paper, we empower pre-trained LLMs on non-English languages by building semantic alignment across languages.
arXiv Detail & Related papers (2023-08-09T13:32:06Z) - Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis [103.89753784762445]
Large language models (LLMs) have demonstrated remarkable potential in handling multilingual machine translation (MMT)
This paper systematically investigates the advantages and challenges of LLMs for MMT.
We thoroughly evaluate eight popular LLMs, including ChatGPT and GPT-4.
arXiv Detail & Related papers (2023-04-10T15:51:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.