Fugu-MT 論文翻訳(概要): Communicative Agents for Slideshow Storytelling Video Generation based on LLMs

論文の概要: Communicative Agents for Slideshow Storytelling Video Generation based on LLMs

arxiv url: http://arxiv.org/abs/2509.01277v1
Date: Mon, 01 Sep 2025 09:04:07 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-04 15:17:03.614066
Title: Communicative Agents for Slideshow Storytelling Video Generation based on LLMs
Title（参考訳）: LLMを用いたスライドショーストーリーテリング映像生成のためのコミュニケーションエージェント
Authors: Jingxing Fan, Jinrong Shen, Yusheng Yao, Shuangqing Wang, Qian Wang, Yuling Wang,
Abstract要約: Video-Generation-Team (VGTeam) は、ビデオ生成パイプラインを再定義する新しいスライドショービデオ生成システムである。従来のビデオ制作のシーケンシャルステージをエミュレートすることで、VGTeamは効率性とスケーラビリティの両方において顕著な改善を実現している。平均してビデオは0.103ドル、生成率は98.4%である。
参考スコア（独自算出の注目度）: 4.389263274945811
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: With the rapid advancement of artificial intelligence (AI), the proliferation of AI-generated content (AIGC) tasks has significantly accelerated developments in text-to-video generation. As a result, the field of video production is undergoing a transformative shift. However, conventional text-to-video models are typically constrained by high computational costs. In this study, we propose Video-Generation-Team (VGTeam), a novel slide show video generation system designed to redefine the video creation pipeline through the integration of large language models (LLMs). VGTeam is composed of a suite of communicative agents, each responsible for a distinct aspect of video generation, such as scriptwriting, scene creation, and audio design. These agents operate collaboratively within a chat tower workflow, transforming user-provided textual prompts into coherent, slide-style narrative videos. By emulating the sequential stages of traditional video production, VGTeam achieves remarkable improvements in both efficiency and scalability, while substantially reducing computational overhead. On average, the system generates videos at a cost of only $0.103, with a successful generation rate of 98.4%. Importantly, this framework maintains a high degree of creative fidelity and customization. The implications of VGTeam are far-reaching. It democratizes video production by enabling broader access to high-quality content creation without the need for extensive resources. Furthermore, it highlights the transformative potential of language models in creative domains and positions VGTeam as a pioneering system for next-generation content creation.
Abstract（参考訳）: 人工知能(AI)の急速な進歩により、AIGCタスクの急増はテキスト・ビデオ生成の発展を著しく加速した。その結果、ビデオ制作の分野は変貌を遂げつつある。しかし、従来のテキスト・ビデオモデルは通常、高い計算コストで制約される。本研究では,大規模言語モデル(LLM)の統合により,映像生成パイプラインを再定義する新しいスライドショー映像生成システムであるVGTeamを提案する。 VGTeamは一連のコミュニケーションエージェントで構成されており、それぞれがスクリプト作成、シーン作成、オーディオデザインなど、ビデオ生成の異なる側面を担っている。これらのエージェントはチャットタワーのワークフロー内で協調して動作し、ユーザが提供するテキストプロンプトを一貫性のあるスライドスタイルの物語ビデオに変換する。従来のビデオ制作のシーケンシャルステージをエミュレートすることで、VGTeamは効率とスケーラビリティの両方において顕著な改善を実現し、計算オーバーヘッドを大幅に削減した。平均してビデオは0.103ドル、生成率は98.4%である。重要な点として、このフレームワークは高い創造性とカスタマイズを維持している。 VGTeamの意義は極めて大きい。ビデオ制作の民主化を図り、大量のリソースを必要とせずに高品質なコンテンツ制作に幅広いアクセスを可能にする。さらに、創造的なドメインにおける言語モデルの変革の可能性を強調し、VGTeamを次世代コンテンツ作成の先駆的なシステムとして位置づける。

論文の概要: Communicative Agents for Slideshow Storytelling Video Generation based on LLMs

関連論文リスト