Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k
- URL: http://arxiv.org/abs/2503.09642v2
- Date: Sun, 23 Mar 2025 13:43:54 GMT
- Title: Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k
- Authors: Xiangyu Peng, Zangwei Zheng, Chenhui Shen, Tom Young, Xinying Guo, Binluo Wang, Hang Xu, Hongxin Liu, Mingyan Jiang, Wenjun Li, Yuhui Wang, Anbang Ye, Gang Ren, Qianran Ma, Wanying Liang, Xiang Lian, Xiwen Wu, Yuting Zhong, Zhuangyan Li, Chaoyu Gong, Guojun Lei, Leijun Cheng, Limin Zhang, Minghao Li, Ruijie Zhang, Silan Hu, Shijie Huang, Xiaokang Wang, Yuanheng Zhao, Yuqi Wang, Ziang Wei, Yang You,
- Abstract summary: We present Open-Sora 2.0, a commercial-level video generation model trained for only $200k.<n>We demonstrate that the cost of training a top-performing video generation model is highly controllable.<n>By making Open-Sora 2.0 fully open-source, we aim to democratize access to advanced video generation technology.
- Score: 39.475733412473154
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video generation models have achieved remarkable progress in the past year. The quality of AI video continues to improve, but at the cost of larger model size, increased data quantity, and greater demand for training compute. In this report, we present Open-Sora 2.0, a commercial-level video generation model trained for only $200k. With this model, we demonstrate that the cost of training a top-performing video generation model is highly controllable. We detail all techniques that contribute to this efficiency breakthrough, including data curation, model architecture, training strategy, and system optimization. According to human evaluation results and VBench scores, Open-Sora 2.0 is comparable to global leading video generation models including the open-source HunyuanVideo and the closed-source Runway Gen-3 Alpha. By making Open-Sora 2.0 fully open-source, we aim to democratize access to advanced video generation technology, fostering broader innovation and creativity in content creation. All resources are publicly available at: https://github.com/hpcaitech/Open-Sora.
Related papers
- Wan: Open and Advanced Large-Scale Video Generative Models [83.03603932233275]
Wan is a suite of video foundation models designed to push the boundaries of video generation.
We open-source the entire series of Wan, including source code and all models, with the goal of fostering the growth of the video generation community.
arXiv Detail & Related papers (2025-03-26T08:25:43Z) - VideoWorld: Exploring Knowledge Learning from Unlabeled Videos [119.35107657321902]
This work explores whether a deep generative model can learn complex knowledge solely from visual input.
We develop VideoWorld, an auto-regressive video generation model trained on unlabeled video data, and test its knowledge acquisition abilities in video-based Go and robotic control tasks.
arXiv Detail & Related papers (2025-01-16T18:59:10Z) - Open-Sora: Democratizing Efficient Video Production for All [15.68402186082992]
We create Open-Sora, an open-source video generation model designed to produce high-fidelity video content.
Open-Sora supports a wide spectrum of visual generation tasks, including text-to-image generation, text-to-video generation, and image-to-video generation.
By embracing the open-source principle, Open-Sora democratizes full access to all the training/inference/data preparation codes as well as model weights.
arXiv Detail & Related papers (2024-12-29T08:52:49Z) - HunyuanVideo: A Systematic Framework For Large Video Generative Models [82.4392082688739]
HunyuanVideo is an innovative open-source video foundation model.<n>It incorporates data curation, advanced architectural design, progressive model scaling and training.<n>As a result, we successfully trained a video generative model with over 13 billion parameters.
arXiv Detail & Related papers (2024-12-03T23:52:37Z) - Open-Sora Plan: Open-Source Large Video Generation Model [48.475478021553755]
Open-Sora Plan is an open-source project that aims to contribute a large generation model for generating desired high-resolution videos with long durations based on various user inputs.
Our project comprises multiple components for the entire video generation process, including a Wavelet-Flow Variational Autoencoder, a Joint Image-Video Skiparse Denoiser, and various condition controllers.
Benefiting from efficient thoughts, our Open-Sora Plan achieves impressive video generation results in both qualitative and quantitative evaluations.
arXiv Detail & Related papers (2024-11-28T14:07:45Z) - The Dawn of Video Generation: Preliminary Explorations with SORA-like Models [14.528428430884015]
High-quality video generation, encompassing text-to-video (T2V), image-to-video (I2V), and video-to-video (V2V) generation, holds considerable significance in content creation.
Models like SORA have advanced generating videos with higher resolution, more natural motion, better vision-language alignment, and increased controllability.
arXiv Detail & Related papers (2024-10-07T17:35:10Z) - VideoCrafter1: Open Diffusion Models for High-Quality Video Generation [97.5767036934979]
We introduce two diffusion models for high-quality video generation, namely text-to-video (T2V) and image-to-video (I2V) models.
T2V models synthesize a video based on a given text input, while I2V models incorporate an additional image input.
Our proposed T2V model can generate realistic and cinematic-quality videos with a resolution of $1024 times 576$, outperforming other open-source T2V models in terms of quality.
arXiv Detail & Related papers (2023-10-30T13:12:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.