Related papers: InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning

InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning

URL: http://arxiv.org/abs/2502.11573v1
Date: Mon, 17 Feb 2025 09:07:32 GMT
Title: InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning
Authors: Congkai Xie, Shuo Cai, Wenjun Wang, Pengxiang Li, Zhijie Sang, Kejing Yang, Yiming Zhang, Zhen Li, Guanghao Zhu, Zeyu Liu, Yang Yu, Yuhang Liu, Su Lu, Baoyi He, Qi Zhou, Xiaotian Han, Jianbo Yuan, Shengyu Zhang, Fei Wu, Hongxia Yang,
Abstract summary: This paper focuses on developing efficient Small Language Models (SLMs) and Multimodal Small Language Models (MSLMs)<n>We introduce a novel training pipeline that enhances reasoning capabilities and facilitates deployment on edge devices.<n>InfR aims to advance AI systems by improving reasoning, reducing adoption barriers, and addressing privacy concerns through smaller model sizes.
Score: 46.64087822795915
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) have made significant advancements in reasoning capabilities. However, they still face challenges such as high computational demands and privacy concerns. This paper focuses on developing efficient Small Language Models (SLMs) and Multimodal Small Language Models (MSLMs) that retain competitive reasoning abilities. We introduce a novel training pipeline that enhances reasoning capabilities and facilitates deployment on edge devices, achieving state-of-the-art performance while minimizing development costs. \InfR~ aims to advance AI systems by improving reasoning, reducing adoption barriers, and addressing privacy concerns through smaller model sizes. Resources are available at https://github. com/Reallm-Labs/InfiR.

Related papers

MELABenchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource Maltese NLP [0.0]
We evaluate the performance of 55 publicly available Large Language Models (LLMs) on Maltese, a low-resource language.<n>Our experiments highlight that many models perform poorly, particularly on generative tasks.<n>We conclude that prior exposure to Maltese during pre-training and instruction-tuning emerges as the most important factor.
arXiv Detail & Related papers (2025-06-04T18:59:52Z)
Enhancing Code Generation for Low-Resource Languages: No Silver Bullet [55.39571645315926]
Large Language Models (LLMs) rely on large and diverse datasets to learn syntax, semantics, and usage patterns of programming languages. For low-resource languages, the limited availability of such data hampers the models' ability to generalize effectively. We present an empirical study investigating the effectiveness of several approaches for boosting LLMs' performance on low-resource languages.
arXiv Detail & Related papers (2025-01-31T12:23:28Z)
LLMic: Romanian Foundation Language Model [76.09455151754062]
We present LLMic, a foundation language model designed specifically for the Romanian Language. We show that fine-tuning LLMic for language translation after the initial pretraining phase outperforms existing solutions in English-to-Romanian translation tasks.
arXiv Detail & Related papers (2025-01-13T22:14:45Z)
Computational Bottlenecks of Training Small-scale Large Language Models [19.663560481459164]
Small-scale large Language Models (SLMs) are gaining attention due to cost and efficiency demands from consumers. In this study, we explore the computational bottlenecks of training SLMs. We assess these factors on popular cloud services using metrics such as loss per dollar and tokens per second.
arXiv Detail & Related papers (2024-10-25T10:30:21Z)
InkubaLM: A small language model for low-resource African languages [9.426968756845389]
InkubaLM is a small language model with 0.4 billion parameters. It achieves performance comparable to models with significantly larger parameter counts. It demonstrates remarkable consistency across multiple languages.
arXiv Detail & Related papers (2024-08-30T05:42:31Z)
Unlocking the Potential of Model Merging for Low-Resource Languages [66.7716891808697]
Adapting large language models to new languages typically involves continual pre-training (CT) followed by supervised fine-tuning (SFT) We propose model merging as an alternative for low-resource languages, combining models with distinct capabilities into a single model without additional training. Experiments based on Llama-2-7B demonstrate that model merging effectively endows LLMs for low-resource languages with task-solving abilities, outperforming CT-then-SFT in scenarios with extremely scarce data.
arXiv Detail & Related papers (2024-07-04T15:14:17Z)
Easy Problems That LLMs Get Wrong [0.0]
We introduce a comprehensive Linguistic Benchmark designed to evaluate the limitations of Large Language Models (LLMs) Through a series of straightforward questions, it uncovers the significant limitations of well-regarded models to perform tasks that humans manage with ease.
arXiv Detail & Related papers (2024-05-30T02:09:51Z)
LVLM-Interpret: An Interpretability Tool for Large Vision-Language Models [50.259006481656094]
We present a novel interactive application aimed towards understanding the internal mechanisms of large vision-language models. Our interface is designed to enhance the interpretability of the image patches, which are instrumental in generating an answer. We present a case study of how our application can aid in understanding failure mechanisms in a popular large multi-modal model: LLaVA.
arXiv Detail & Related papers (2024-04-03T23:57:34Z)
A Survey of Large Language Models [81.06947636926638]
Language modeling has been widely studied for language understanding and generation in the past two decades. Recently, pre-trained language models (PLMs) have been proposed by pre-training Transformer models over large-scale corpora. To discriminate the difference in parameter scale, the research community has coined the term large language models (LLM) for the PLMs of significant size.
arXiv Detail & Related papers (2023-03-31T17:28:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.