Fugu-MT 論文翻訳(概要): AMALIA-VL: A Native European Portuguese Open-Source Vision and Language Model

論文の概要: AMALIA-VL: A Native European Portuguese Open-Source Vision and Language Model

arxiv url: http://arxiv.org/abs/2606.19100v1
Date: Wed, 17 Jun 2026 14:11:41 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-18 17:16:51.194309
Title: AMALIA-VL: A Native European Portuguese Open-Source Vision and Language Model
Title（参考訳）: AMALIA-VL: ポルトガルのネイティブなオープンソースビジョンと言語モデル
Authors: Diogo Glória-Silva, João Cardeira, Manuel Letras da Luz, Afonso Simplício, Gonçalo Vinagre, Diogo Tavares, Rafael Ferreira, Inês Calvo, Inês Vieira, David Semedo, João Magalhães,
Abstract要約: AMALIA-VLはpt-PT用に開発された最初のオープンソースの命令調整型LVLMである。我々は、pt-PT LVLM開発を民主化するため、モデルウェイト、トレーニングデータ、建設パイプラインと、機械翻訳されたpt-PT評価ベンチマークをリリースする。
参考スコア（独自算出の注目度）: 6.462620395914082
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Vision and Language Models (LVLMs) have advanced rapidly, yet European Portuguese (pt-PT) remains systematically underserved by existing open-source multimodal models, which either conflate it with Brazilian Portuguese or severely under-represent it in their training data mixes. We introduce AMALIA-VL, the first open-source instruction-tuned LVLM built natively for pt-PT, pairing a high-resolution vision encoder with dynamic image tiling and a fully open pt-PT-optimized language model via a learned connector. We contribute with a purposefully designed three-stage training process - vision-language alignment, general visual instruction tuning, and preference optimization - together with a pt-PT-centric multimodal data mix combining curated and translated public datasets with novel datasets that address the near-total absence of European Portuguese multimodal resources. Our evaluation shows that AMALIA-VL establishes a strong baseline for open-source pt-PT LVLMs.We will release model weights, training data, and construction pipelines along with machine-translated pt-PT evaluation benchmarks to help democratize pt-PT LVLM development.
Abstract（参考訳）: LVLM(Large Vision and Language Models)は急速に進歩しているが、ヨーロッパのポルトガル語(pt-PT)は既存のオープンソースのマルチモーダルモデルによって体系的に守られている。 AMALIA-VLは、pt-PT用にネイティブに構築された最初のオープンソース命令チューニングLVLMであり、動的画像タイリングと完全にオープンなpt-PT最適化言語モデルを組み合わせた高解像度ビジョンエンコーダである。我々は、目的的に設計された3段階のトレーニングプロセス – 視覚言語アライメント、一般的な視覚指導チューニング、嗜好最適化 – と、カリキュラムと翻訳された公開データセットを組み合わせたpt-PT中心のマルチモーダルデータと、ヨーロッパのポルトガルのマルチモーダルリソースのほぼ不在に対処する新しいデータセットを組み合わせることで、コントリビュートする。評価の結果,AMALIA-VL はオープンソース pt-PT LVLM の強力なベースラインを確立しており,モデルウェイト,トレーニングデータ,建設パイプラインおよび機械翻訳 pt-PT 評価ベンチマークを公開し,pt-PT LVLM 開発を民主化する。

論文の概要: AMALIA-VL: A Native European Portuguese Open-Source Vision and Language Model

関連論文リスト