Open-Source Multimodal Moxin Models with Moxin-VLM and Moxin-VLA
- URL: http://arxiv.org/abs/2512.22208v1
- Date: Mon, 22 Dec 2025 02:36:42 GMT
- Title: Open-Source Multimodal Moxin Models with Moxin-VLM and Moxin-VLA
- Authors: Pu Zhao, Xuan Shen, Zhenglun Kong, Yixin Shen, Sung-En Chang, Arash Akbari, Timothy Rupprecht, Lei Lu, Enfu Nan, Changdi Yang, Yumei He, Weiyan Shi, Xingchen Xu, Yu Huang, Wei Jiang, Wei Wang, Yue Chen, Yong He, Yanzhi Wang,
- Abstract summary: Moxin 7B is introduced as a fully open-source Large Language Models (LLMs)<n>We develop three variants based on Moxin, including Moxin-VLM, Moxin-VLA, and Moxin-Chinese.<n> Experiments show that our models achieve superior performance in various evaluations.
- Score: 53.68989489261506
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Recently, Large Language Models (LLMs) have undergone a significant transformation, marked by a rapid rise in both their popularity and capabilities. Leading this evolution are proprietary LLMs like GPT-4 and GPT-o1, which have captured widespread attention in the AI community due to their remarkable performance and versatility. Simultaneously, open-source LLMs, such as LLaMA and Mistral, have made great contributions to the ever-increasing popularity of LLMs due to the ease to customize and deploy the models across diverse applications. Moxin 7B is introduced as a fully open-source LLM developed in accordance with the Model Openness Framework, which moves beyond the simple sharing of model weights to embrace complete transparency in training, datasets, and implementation detail, thus fostering a more inclusive and collaborative research environment that can sustain a healthy open-source ecosystem. To further equip Moxin with various capabilities in different tasks, we develop three variants based on Moxin, including Moxin-VLM, Moxin-VLA, and Moxin-Chinese, which target the vision-language, vision-language-action, and Chinese capabilities, respectively. Experiments show that our models achieve superior performance in various evaluations. We adopt open-source framework and open data for the training. We release our models, along with the available data and code to derive these models.
Related papers
- 7B Fully Open Source Moxin-LLM/VLM -- From Pretraining to GRPO-based Reinforcement Learning Enhancement [41.463611054440435]
Moxin 7B is a fully open-source Large Language Models (LLMs) developed adhering to principles of open science, open source, open data, and open access.<n>We release the pre-training code and configurations, training and fine-tuning datasets, and intermediate and final checkpoints.<n> Experiments show that our models achieve superior performance in various evaluations such as zero-shot evaluation, few-shot evaluation, and CoT evaluation.
arXiv Detail & Related papers (2024-12-08T02:01:46Z) - NVLM: Open Frontier-Class Multimodal LLMs [64.00053046838225]
We introduce NVLM 1.0, a family of frontier-class multimodal large language models (LLMs) that achieve state-of-the-art results on vision-language tasks.
We propose a novel architecture that enhances both training efficiency and multimodal reasoning capabilities.
We develop production-grade multimodality for the NVLM-1.0 models, enabling them to excel in vision-language tasks.
arXiv Detail & Related papers (2024-09-17T17:59:06Z) - MLLM-LLaVA-FL: Multimodal Large Language Model Assisted Federated Learning [25.45278447786954]
We introduce a novel federated learning framework, named Multimodal Large Language Model Assisted Federated Learning (MLLM-LLaVA-FL)<n>Our framework is adept at harnessing the extensive, yet previously underexploited, open-source data accessible from websites and powerful server-side computational resources.
arXiv Detail & Related papers (2024-09-09T21:04:16Z) - xGen-MM (BLIP-3): A Family of Open Large Multimodal Models [145.5926980973847]
BLIP-3 is an open framework for developing Large Multimodal Models.<n>We release 4B and 14B models, including both the pre-trained base model and the instruction fine-tuned ones.<n>Our models demonstrate competitive performance among open-source LMMs with similar model sizes.
arXiv Detail & Related papers (2024-08-16T17:57:01Z) - MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series [86.31735321970481]
We open-source MAP-Neo, a bilingual language model with 7B parameters trained from scratch on 4.5T high-quality tokens.
Our MAP-Neo is the first fully open-sourced bilingual LLM with comparable performance compared to existing state-of-the-art LLMs.
arXiv Detail & Related papers (2024-05-29T17:57:16Z) - SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models [97.40590590880144]
We develop an extensive Multimodality Large Language Model (MLLM) series.<n>We assemble a comprehensive dataset covering publicly available resources in language, vision, and vision-language tasks.<n>We obtain a spectrum of MLLMs that vary in parameter size and multilingual capabilities.
arXiv Detail & Related papers (2024-02-08T18:59:48Z) - Knowledge Fusion of Large Language Models [73.28202188100646]
This paper introduces the notion of knowledge fusion for large language models (LLMs)
We externalize their collective knowledge and unique strengths, thereby elevating the capabilities of the target model beyond those of any individual source LLM.
Our findings confirm that the fusion of LLMs can improve the performance of the target model across a range of capabilities such as reasoning, commonsense, and code generation.
arXiv Detail & Related papers (2024-01-19T05:02:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.