Why We Build Local Large Language Models: An Observational Analysis from 35 Japanese and Multilingual LLMs
- URL: http://arxiv.org/abs/2412.14471v1
- Date: Thu, 19 Dec 2024 02:39:26 GMT
- Title: Why We Build Local Large Language Models: An Observational Analysis from 35 Japanese and Multilingual LLMs
- Authors: Koshiro Saito, Sakae Mizuki, Masanari Ohi, Taishi Nakamura, Taihei Shiotani, Koki Maeda, Youmi Ma, Kakeru Hattori, Kazuki Fujii, Takumi Okamoto, Shigeki Ishida, Hiroya Takamura, Rio Yokota, Naoaki Okazaki,
- Abstract summary: We evaluated 35 Japanese, English, and multilingual LLMs on 19 evaluation benchmarks for Japanese and English.
We found that training on English text can improve the scores of academic subjects in Japanese.
It is unnecessary to specifically train on Japanese text to enhance abilities for solving Japanese code generation, arithmetic reasoning, commonsense, and reading comprehension tasks.
- Score: 22.622778594671345
- License:
- Abstract: Why do we build local large language models (LLMs)? What should a local LLM learn from the target language? Which abilities can be transferred from other languages? Do language-specific scaling laws exist? To explore these research questions, we evaluated 35 Japanese, English, and multilingual LLMs on 19 evaluation benchmarks for Japanese and English, taking Japanese as a local language. Adopting an observational approach, we analyzed correlations of benchmark scores, and conducted principal component analysis (PCA) on the scores to derive \textit{ability factors} of local LLMs. We found that training on English text can improve the scores of academic subjects in Japanese (JMMLU). In addition, it is unnecessary to specifically train on Japanese text to enhance abilities for solving Japanese code generation, arithmetic reasoning, commonsense, and reading comprehension tasks. In contrast, training on Japanese text could improve question-answering tasks about Japanese knowledge and English-Japanese translation, which indicates that abilities for solving these two tasks can be regarded as \textit{Japanese abilities} for LLMs. Furthermore, we confirmed that the Japanese abilities scale with the computational budget for Japanese text.
Related papers
- Beyond Profile: From Surface-Level Facts to Deep Persona Simulation in LLMs [50.0874045899661]
We introduce CharacterBot, a model designed to replicate both the linguistic patterns and distinctive thought processes of a character.
Using Lu Xun as a case study, we propose four training tasks derived from his 17 essay collections.
These include a pre-training task focused on mastering external linguistic structures and knowledge, as well as three fine-tuning tasks.
We evaluate CharacterBot on three tasks for linguistic accuracy and opinion comprehension, demonstrating that it significantly outperforms the baselines on our adapted metrics.
arXiv Detail & Related papers (2025-02-18T16:11:54Z) - JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation [63.83457341009046]
JMMMU (Japanese MMMU) is the first large-scale Japanese benchmark designed to evaluate LMMs on expert-level tasks based on the Japanese cultural context.
Using the CA subset, we observe performance drop in many LMMs when evaluated in Japanese, which is purely attributable to language variation.
By combining both subsets, we identify that some LMMs perform well on the CA subset but not on the CS subset, exposing a shallow understanding of the Japanese language that lacks depth in cultural understanding.
arXiv Detail & Related papers (2024-10-22T17:59:56Z) - Evaluating Large Language Models with Tests of Spanish as a Foreign Language: Pass or Fail? [2.9630910534509924]
We evaluate the performance of state-of-the-art LLMs in the recently released benchmark with similar questions to those of Spanish exams for foreign students.
Results show that LLMs perform well at understanding Spanish but are still far from achieving the level of a native speaker in terms of grammatical competence.
arXiv Detail & Related papers (2024-09-08T11:30:03Z) - Beyond English-Centric LLMs: What Language Do Multilingual Language Models Think in? [40.53443067505763]
We investigate whether non-English-centric LLMs, despite their strong performance, think' in their respective dominant language.
We term such languages as internal $textbflatent languages$.
arXiv Detail & Related papers (2024-08-20T13:05:41Z) - Continual Pre-Training for Cross-Lingual LLM Adaptation: Enhancing Japanese Language Capabilities [20.40712512748528]
Cross-lingual continual pre-training of large language models (LLMs) initially trained on English corpus allows us to leverage the vast amount of English language resources and reduce the pre-training cost.
We constructed Swallow, an LLM with enhanced Japanese capability, by extending the vocabulary of Llama 2 to include Japanese characters and conducting continual pre-training on a large Japanese web corpus.
arXiv Detail & Related papers (2024-04-27T06:07:55Z) - MLaKE: Multilingual Knowledge Editing Benchmark for Large Language Models [65.10456412127405]
MLaKE is a benchmark for the adaptability of knowledge editing methods across five languages.
MLaKE aggregates fact chains from Wikipedia across languages and generates questions in both free-form and multiple-choice.
We evaluate the multilingual knowledge editing generalization capabilities of existing methods on MLaKE.
arXiv Detail & Related papers (2024-04-07T15:23:28Z) - Is Translation All You Need? A Study on Solving Multilingual Tasks with Large Language Models [79.46179534911019]
Large language models (LLMs) have demonstrated multilingual capabilities; yet, they are mostly English-centric due to imbalanced training corpora.
This work extends the evaluation from NLP tasks to real user queries.
For culture-related tasks that need deep language understanding, prompting in the native language tends to be more promising.
arXiv Detail & Related papers (2024-03-15T12:47:39Z) - Cross-Lingual Knowledge Editing in Large Language Models [73.12622532088564]
Knowledge editing has been shown to adapt large language models to new knowledge without retraining from scratch.
It is still unknown the effect of source language editing on a different target language.
We first collect a large-scale cross-lingual synthetic dataset by translating ZsRE from English to Chinese.
arXiv Detail & Related papers (2023-09-16T11:07:52Z) - Efficiently Aligned Cross-Lingual Transfer Learning for Conversational
Tasks using Prompt-Tuning [98.60739735409243]
Cross-lingual transfer of language models trained on high-resource languages like English has been widely studied for many NLP tasks.
We introduce XSGD for cross-lingual alignment pretraining, a parallel and large-scale multilingual conversation dataset.
To facilitate aligned cross-lingual representations, we develop an efficient prompt-tuning-based method for learning alignment prompts.
arXiv Detail & Related papers (2023-04-03T18:46:01Z) - Linguistically-driven Multi-task Pre-training for Low-resource Neural
Machine Translation [31.225252462128626]
We propose Japanese-specific sequence to sequence (JASS) for language pairs involving Japanese as the source or target language, and English-specific sequence to sequence (ENSS) for language pairs involving English.
JASS focuses on masking and reordering Japanese linguistic units known as bunsetsu, whereas ENSS is proposed based on phrase structure masking and reordering tasks.
arXiv Detail & Related papers (2022-01-20T09:10:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.