MindBench: A Comprehensive Benchmark for Mind Map Structure Recognition and Analysis
- URL: http://arxiv.org/abs/2407.02842v1
- Date: Wed, 3 Jul 2024 06:39:18 GMT
- Title: MindBench: A Comprehensive Benchmark for Mind Map Structure Recognition and Analysis
- Authors: Lei Chen, Feng Yan, Yujie Zhong, Shaoxiang Chen, Zequn Jie, Lin Ma,
- Abstract summary: We introduce the new benchmark named MindBench for document analysis.
It includes meticulously constructed bilingual authentic or synthetic images, detailed annotations, evaluation metrics and baseline models.
These tasks include full parsing, partial parsing, position-related parsing, structured Visual Question Answering (VQA), and position-related VQA.
- Score: 35.31073435549237
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multimodal Large Language Models (MLLM) have made significant progress in the field of document analysis. Despite this, existing benchmarks typically focus only on extracting text and simple layout information, neglecting the complex interactions between elements in structured documents such as mind maps and flowcharts. To address this issue, we introduce the new benchmark named MindBench, which not only includes meticulously constructed bilingual authentic or synthetic images, detailed annotations, evaluation metrics and baseline models, but also specifically designs five types of structured understanding and parsing tasks. These tasks include full parsing, partial parsing, position-related parsing, structured Visual Question Answering (VQA), and position-related VQA, covering key areas such as text recognition, spatial awareness, relationship discernment, and structured parsing. Extensive experimental results demonstrate the substantial potential and significant room for improvement in current models' ability to handle structured document information. We anticipate that the launch of MindBench will significantly advance research and application development in structured document analysis technology. MindBench is available at: https://miasanlei.github.io/MindBench.github.io/.
Related papers
- BabelBench: An Omni Benchmark for Code-Driven Analysis of Multimodal and Multistructured Data [61.936320820180875]
Large language models (LLMs) have become increasingly pivotal across various domains.
BabelBench is an innovative benchmark framework that evaluates the proficiency of LLMs in managing multimodal multistructured data with code execution.
Our experimental findings on BabelBench indicate that even cutting-edge models like ChatGPT 4 exhibit substantial room for improvement.
arXiv Detail & Related papers (2024-10-01T15:11:24Z) - Hypergraph based Understanding for Document Semantic Entity Recognition [65.84258776834524]
We build a novel hypergraph attention document semantic entity recognition framework, HGA, which uses hypergraph attention to focus on entity boundaries and entity categories at the same time.
Our results on FUNSD, CORD, XFUNDIE show that our method can effectively improve the performance of semantic entity recognition tasks.
arXiv Detail & Related papers (2024-07-09T14:35:49Z) - SRFUND: A Multi-Granularity Hierarchical Structure Reconstruction Benchmark in Form Understanding [55.48936731641802]
We present the SRFUND, a hierarchically structured multi-task form understanding benchmark.
SRFUND provides refined annotations on top of the original FUNSD and XFUND datasets.
The dataset includes eight languages including English, Chinese, Japanese, German, French, Spanish, Italian, and Portuguese.
arXiv Detail & Related papers (2024-06-13T02:35:55Z) - GeoContrastNet: Contrastive Key-Value Edge Learning for Language-Agnostic Document Understanding [4.258365032282028]
We present a language-agnostic framework to structured document understanding (DU) by integrating a contrastive learning objective with graph attention networks (GATs)
We propose a novel methodology that combines geometric edge features with visual features within an overall two-staged GAT-based framework.
Our results highlight the model's proficiency in identifying key-value relationships within the FUNSD dataset for forms and also discovering the spatial relationships in table-structured layouts for RVLCDIP business invoices.
arXiv Detail & Related papers (2024-05-06T01:40:20Z) - mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding [100.17063271791528]
We propose the Unified Structure Learning to boost the performance of MLLMs.
Our model DocOwl 1.5 achieves state-of-the-art performance on 10 visual document understanding benchmarks.
arXiv Detail & Related papers (2024-03-19T16:48:40Z) - StrucTexT: Structured Text Understanding with Multi-Modal Transformers [29.540122964399046]
Structured text understanding on Visually Rich Documents (VRDs) is a crucial part of Document Intelligence.
This paper proposes a unified framework named StrucTexT, which is flexible and effective for handling both sub-tasks.
We evaluate our method for structured text understanding at segment-level and token-level and show it outperforms the state-of-the-art counterparts.
arXiv Detail & Related papers (2021-08-06T02:57:07Z) - An Extensible Dashboard Architecture For Visualizing Base And Analyzed
Data [2.169919643934826]
This paper focuses on an architecture for visualization of base as well as analyzed data.
This paper proposes a modular architecture of a dashboard for user-interaction, visualization management, and complex analysis of base data.
arXiv Detail & Related papers (2021-06-09T19:45:43Z) - VSR: A Unified Framework for Document Layout Analysis combining Vision,
Semantics and Relations [40.721146438291335]
We propose a unified framework VSR for document layout analysis, combining vision, semantics and relations.
On three popular benchmarks, VSR outperforms previous models by large margins.
arXiv Detail & Related papers (2021-05-13T12:20:30Z) - Multi-Aspect Sentiment Analysis with Latent Sentiment-Aspect Attribution [7.289918297809611]
We introduce a new framework called the sentiment-aspect attribution module (SAAM)
The framework works by exploiting the correlations between sentence-level embedding features and variations of document-level aspect rating scores.
Experiments on a hotel review dataset and a beer review dataset have shown SAAM can improve sentiment analysis performance.
arXiv Detail & Related papers (2020-12-15T16:34:36Z) - KILT: a Benchmark for Knowledge Intensive Language Tasks [102.33046195554886]
We present a benchmark for knowledge-intensive language tasks (KILT)
All tasks in KILT are grounded in the same snapshot of Wikipedia.
We find that a shared dense vector index coupled with a seq2seq model is a strong baseline.
arXiv Detail & Related papers (2020-09-04T15:32:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.