Movie2Story: A framework for understanding videos and telling stories in the form of novel text
- URL: http://arxiv.org/abs/2412.14965v2
- Date: Sat, 11 Jan 2025 14:08:22 GMT
- Title: Movie2Story: A framework for understanding videos and telling stories in the form of novel text
- Authors: Kangning Li, Zheyang Jia, Anyu Ying,
- Abstract summary: We propose a novel benchmark to evaluate text generation capabilities in scenarios enriched with auxiliary information.
Our work introduces an innovative automatic dataset generation method to ensure the availability of accurate auxiliary information.
Our experiments reveal that current Multi-modal Large Language Models (MLLMs) perform suboptimally under the proposed evaluation metrics.
- Score: 0.0
- License:
- Abstract: In recent years, large-scale models have achieved significant advancements, accompanied by the emergence of numerous high-quality benchmarks for evaluating various aspects of their comprehension abilities. However, most existing benchmarks primarily focus on spatial understanding in static image tasks. While some benchmarks extend evaluations to temporal tasks, they fall short in assessing text generation under complex contexts involving long videos and rich auxiliary information. To address this limitation, we propose a novel benchmark: the Multi-modal Story Generation Benchmark (MSBench), designed to evaluate text generation capabilities in scenarios enriched with auxiliary information. Our work introduces an innovative automatic dataset generation method to ensure the availability of accurate auxiliary information. On one hand, we leverage existing datasets and apply automated processes to generate new evaluation datasets, significantly reducing manual efforts. On the other hand, we refine auxiliary data through systematic filtering and utilize state-of-the-art models to ensure the fairness and accuracy of the ground-truth datasets. Our experiments reveal that current Multi-modal Large Language Models (MLLMs) perform suboptimally under the proposed evaluation metrics, highlighting significant gaps in their capabilities. To address these challenges, we propose a novel model architecture and methodology to better handle the overall process, demonstrating improvements on our benchmark.
Related papers
- BENCHAGENTS: Automated Benchmark Creation with Agent Interaction [16.4783894348333]
We introduce BENCHAGENTS, a framework that methodically leverages large language models (LLMs) to automate benchmark creation for complex capabilities.
We use BENCHAGENTS to create benchmarks to evaluate capabilities related to planning and constraint satisfaction during text generation.
We then use these benchmarks to study seven state-of-the-art models and extract new insights on common failure modes and model differences.
arXiv Detail & Related papers (2024-10-29T22:56:18Z) - A Survey of Small Language Models [104.80308007044634]
Small Language Models (SLMs) have become increasingly important due to their efficiency and performance to perform various language tasks with minimal computational resources.
We present a comprehensive survey on SLMs, focusing on their architectures, training techniques, and model compression techniques.
arXiv Detail & Related papers (2024-10-25T23:52:28Z) - Improving embedding with contrastive fine-tuning on small datasets with expert-augmented scores [12.86467344792873]
The proposed method uses soft labels derived from expert-augmented scores to fine-tune embedding models.
The paper evaluates the method using a Q&A dataset from an online shopping website and eight expert models.
arXiv Detail & Related papers (2024-08-19T01:59:25Z) - Scaling Up Video Summarization Pretraining with Large Language Models [73.74662411006426]
We introduce an automated and scalable pipeline for generating a large-scale video summarization dataset.
We analyze the limitations of existing approaches and propose a new video summarization model that effectively addresses them.
Our work also presents a new benchmark dataset that contains 1200 long videos each with high-quality summaries annotated by professionals.
arXiv Detail & Related papers (2024-04-04T11:59:06Z) - Exploring Precision and Recall to assess the quality and diversity of LLMs [82.21278402856079]
We introduce a novel evaluation framework for Large Language Models (LLMs) such as textscLlama-2 and textscMistral.
This approach allows for a nuanced assessment of the quality and diversity of generated text without the need for aligned corpora.
arXiv Detail & Related papers (2024-02-16T13:53:26Z) - EvalCrafter: Benchmarking and Evaluating Large Video Generation Models [70.19437817951673]
We argue that it is hard to judge the large conditional generative models from the simple metrics since these models are often trained on very large datasets with multi-aspect abilities.
Our approach involves generating a diverse and comprehensive list of 700 prompts for text-to-video generation.
Then, we evaluate the state-of-the-art video generative models on our carefully designed benchmark, in terms of visual qualities, content qualities, motion qualities, and text-video alignment with 17 well-selected objective metrics.
arXiv Detail & Related papers (2023-10-17T17:50:46Z) - EditEval: An Instruction-Based Benchmark for Text Improvements [73.5918084416016]
This work presents EditEval: An instruction-based, benchmark and evaluation suite for automatic evaluation of editing capabilities.
We evaluate several pre-trained models, which shows that InstructGPT and PEER perform the best, but that most baselines fall below the supervised SOTA.
Our analysis shows that commonly used metrics for editing tasks do not always correlate well, and that optimization for prompts with the highest performance does not necessarily entail the strongest robustness to different models.
arXiv Detail & Related papers (2022-09-27T12:26:05Z) - MuLD: The Multitask Long Document Benchmark [4.835289158553091]
We present a new long document benchmark consisting of only documents over 10,000 tokens.
We show that models with increased context length are better able to solve the tasks presented.
arXiv Detail & Related papers (2022-02-15T12:42:55Z) - Data-to-text Generation with Macro Planning [61.265321323312286]
We propose a neural model with a macro planning stage followed by a generation stage reminiscent of traditional methods.
Our approach outperforms competitive baselines in terms of automatic and human evaluation.
arXiv Detail & Related papers (2021-02-04T16:32:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.