Related papers: Xmodel-LM Technical Report

Xmodel-LM Technical Report

URL: http://arxiv.org/abs/2406.02856v5
Date: Tue, 19 Nov 2024 08:38:55 GMT
Title: Xmodel-LM Technical Report
Authors: Yichuan Wang, Yang Liu, Yu Yan, Qun Wang, Xucheng Huang, Ling Jiang,
Abstract summary: Xmodel-LM is a compact and efficient 1.1B language model pre-trained on around 2 trillion tokens. It exhibits remarkable performance despite its smaller size.
Score: 13.451816134545163
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce Xmodel-LM, a compact and efficient 1.1B language model pre-trained on around 2 trillion tokens. Trained on our self-built dataset (Xdata), which balances Chinese and English corpora based on downstream task optimization, Xmodel-LM exhibits remarkable performance despite its smaller size. It notably surpasses existing open-source language models of similar scale. Our model checkpoints and code are publicly accessible on GitHub at https://github.com/XiaoduoAILab/XmodelLM.

Related papers

Xmodel-2 Technical Report [4.0069773933776665]
Xmodel-2 is a large language model designed specifically for reasoning tasks. It employs the WSD learning rate scheduler from MiniCPM to maximize training efficiency and stability. Xmodel-2 achieves state-of-the-art performance in complex reasoning and agent-based tasks, while maintaining low training costs.
arXiv Detail & Related papers (2024-12-27T13:32:10Z)
Xmodel-1.5: An 1B-scale Multilingual LLM [4.298869484709548]
We introduce Xmodel-1.5, a multilingual large language model pretrained on 2 trillion tokens. Xmodel-1.5 employs a custom unigram tokenizer with 65,280 tokens, optimizing both efficiency and accuracy. The model delivers competitive results across multiple languages, including Thai, Arabic, French, Chinese, and English.
arXiv Detail & Related papers (2024-11-15T10:01:52Z)
ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets [106.7760874400261]
This paper presents ML-SUPERB2.0, which is a new benchmark for evaluating pre-trained SSL and supervised speech models. We find performance improvements over the setup of ML-SUPERB, but performance depends on the downstream model design. Also, we find large performance differences between languages and datasets, suggesting the need for more targeted approaches.
arXiv Detail & Related papers (2024-06-12T21:01:26Z)
Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model [7.082567506213992]
We introduce Xmodel-VLM, a cutting-edge multimodal vision language model. It is designed for efficient deployment on consumer GPU servers.
arXiv Detail & Related papers (2024-05-15T09:47:59Z)
Yi: Open Foundation Models by 01.AI [42.94680878285869]
Yi model family is based on 6B and 34B pretrained language models, then we extend them to chat models, 200K long context models, depth-upscaled models, and vision-language models. Our base models achieve strong performance on a wide range of benchmarks like MMLU, and our fine chat models deliver strong human preference rate on major evaluation platforms like AlpacaEval and Arena.
arXiv Detail & Related papers (2024-03-07T16:52:49Z)
FinGPT: Large Generative Models for a Small Language [48.46240937758779]
We create large language models (LLMs) for Finnish, a language spoken by less than 0.1% of the world population. We train seven monolingual models from scratch (186M to 13B parameters) dubbed FinGPT. We continue the pretraining of the multilingual BLOOM model on a mix of its original training data and Finnish, resulting in a 176 billion parameter model we call BLUUMI.
arXiv Detail & Related papers (2023-11-03T08:05:04Z)
NLLB-CLIP -- train performant multilingual image retrieval model on a budget [65.268245109828]
We present NLLB-CLIP - CLIP model with a text encoder from the NLLB model. We used an automatically created dataset of 106,246 good-quality images with captions in 201 languages. We show that NLLB-CLIP is comparable in quality to state-of-the-art models and significantly outperforms them on low-resource languages.
arXiv Detail & Related papers (2023-09-04T23:26:11Z)
Speculative Decoding with Big Little Decoder [108.95187338417541]
Big Little Decoder (BiLD) is a framework that can improve inference efficiency and latency for a wide range of text generation applications. On an NVIDIA T4 GPU, our framework achieves a speedup of up to 2.12x speedup with minimal generation quality degradation. Our framework is fully plug-and-play and can be applied without any modifications in the training process or model architecture.
arXiv Detail & Related papers (2023-02-15T18:55:29Z)
Beyond English-Centric Bitexts for Better Multilingual Language Representation Learning [99.42850643947439]
We show that going beyond English-centric bitexts, coupled with a novel sampling strategy, substantially boosts performance across model sizes. Our XY-LENT XL variant outperforms XLM-RXXL and exhibits competitive performance with mT5 XXL while being 5x and 6x smaller respectively.
arXiv Detail & Related papers (2022-10-26T17:16:52Z)
Larger-Scale Transformers for Multilingual Masked Language Modeling [16.592883204398518]
Two new models dubbed XLM-R XL and XLM-R XXL outperform XLM-R by 1.8% and 2.4% average accuracy on XNLI. Our model also outperforms the RoBERTa-Large model on several English tasks of the GLUE benchmark by 0.3% on average while handling 99 more languages.
arXiv Detail & Related papers (2021-05-02T23:15:02Z)
Transfer training from smaller language model [6.982133308738434]
We find a method to save training time and resource cost by changing the small well-trained model to large model. We test the target model on several data sets and find it is still comparable with the source model.
arXiv Detail & Related papers (2021-04-23T02:56:02Z)
Beyond English-Centric Multilingual Machine Translation [74.21727842163068]
We create a true Many-to-Many multilingual translation model that can translate directly between any pair of 100 languages. We build and open source a training dataset that covers thousands of language directions with supervised data, created through large-scale mining. Our focus on non-English-Centric models brings gains of more than 10 BLEU when directly translating between non-English directions while performing competitively to the best single systems of WMT.
arXiv Detail & Related papers (2020-10-21T17:01:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.