OBI-Bench: Can LMMs Aid in Study of Ancient Script on Oracle Bones?
- URL: http://arxiv.org/abs/2412.01175v2
- Date: Tue, 11 Feb 2025 14:59:40 GMT
- Title: OBI-Bench: Can LMMs Aid in Study of Ancient Script on Oracle Bones?
- Authors: Zijian Chen, Tingzhu Chen, Wenjun Zhang, Guangtao Zhai,
- Abstract summary: We introduce OBI-Bench, a holistic benchmark crafted to evaluate large multi-modal models (LMMs) on whole-process oracle bone inscriptions.
OBI-Bench includes 5,523 meticulously collected diverse-sourced images, covering five key domain problems: recognition, rejoining, classification, retrieval, and deciphering.
Unlike existing benchmarks, OBI-Bench focuses on advanced visual perception and reasoning with OBI-specific knowledge, challenging LMMs to perform tasks akin to those faced by experts.
- Score: 40.226986425846825
- License:
- Abstract: We introduce OBI-Bench, a holistic benchmark crafted to systematically evaluate large multi-modal models (LMMs) on whole-process oracle bone inscriptions (OBI) processing tasks demanding expert-level domain knowledge and deliberate cognition. OBI-Bench includes 5,523 meticulously collected diverse-sourced images, covering five key domain problems: recognition, rejoining, classification, retrieval, and deciphering. These images span centuries of archaeological findings and years of research by front-line scholars, comprising multi-stage font appearances from excavation to synthesis, such as original oracle bone, inked rubbings, oracle bone fragments, cropped single characters, and handprinted characters. Unlike existing benchmarks, OBI-Bench focuses on advanced visual perception and reasoning with OBI-specific knowledge, challenging LMMs to perform tasks akin to those faced by experts. The evaluation of 6 proprietary LMMs as well as 17 open-source LMMs highlights the substantial challenges and demands posed by OBI-Bench. Even the latest versions of GPT-4o, Gemini 1.5 Pro, and Qwen-VL-Max are still far from public-level humans in some fine-grained perception tasks. However, they perform at a level comparable to untrained humans in deciphering tasks, indicating remarkable capabilities in offering new interpretative perspectives and generating creative guesses. We hope OBI-Bench can facilitate the community to develop domain-specific multi-modal foundation models towards ancient language research and delve deeper to discover and enhance these untapped potentials of LMMs.
Related papers
- EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents [63.43699771428243]
EmbodiedBench is an extensive benchmark designed to evaluate vision-driven embodied agents.
We evaluated 13 leading proprietary and open-source MLLMs within EmbodiedBench.
MLLMs excel at high-level tasks but struggle with low-level manipulation.
arXiv Detail & Related papers (2025-02-13T18:11:34Z) - OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning [72.57452266982642]
We introduce OCRBench v2, a large-scale bilingual text-centric benchmark for text recognition.
We find that 20 out of 22 LMMs score below 50 (100 in total) and suffer from five-type limitations.
arXiv Detail & Related papers (2024-12-31T07:32:35Z) - VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents [50.12414817737912]
Large Multimodal Models (LMMs) have ushered in a new era in artificial intelligence, merging capabilities in both language and vision to form highly capable Visual Foundation Agents.
Existing benchmarks fail to sufficiently challenge or showcase the full potential of LMMs in complex, real-world environments.
VisualAgentBench (VAB) is a pioneering benchmark specifically designed to train and evaluate LMMs as visual foundation agents.
arXiv Detail & Related papers (2024-08-12T17:44:17Z) - Oracle Bone Inscriptions Multi-modal Dataset [58.20314888996118]
Oracle bone inscriptions(OBI) is the earliest developed writing system in China, bearing invaluable written exemplifications of early Shang history and paleography.
This paper proposes an Oracle Bone Inscriptions Multi-modal dataset, which includes annotation information for 10,077 pieces of oracle bones.
This dataset can be used for a variety of AI-related research tasks relevant to the field of OBI, such as OBI Character Detection and Recognition, Rubbing Denoising, Character Matching, Character Generation, Reading Sequence Prediction, Missing Characters Completion task and so on.
arXiv Detail & Related papers (2024-07-04T12:47:32Z) - A-Bench: Are LMMs Masters at Evaluating AI-generated Images? [78.3699767628502]
A-Bench is a benchmark designed to diagnose whether multi-modal models (LMMs) are masters at evaluating AI-generated images (AIGIs)
Ultimately, 2,864 AIGIs from 16 text-to-image models are sampled, each paired with question-answers annotated by human experts, and tested across 18 leading LMMs.
arXiv Detail & Related papers (2024-06-05T08:55:02Z) - MechGPT, a language-based strategy for mechanics and materials modeling
that connects knowledge across scales, disciplines and modalities [0.0]
We use a Large Language Model (LLM) to distill question-answer pairs from raw sources followed by fine-tuning.
The resulting MechGPT LLM foundation model is used in a series of computational experiments to explore its capacity for knowledge retrieval, various language tasks, hypothesis generation, and connecting knowledge across disparate areas.
arXiv Detail & Related papers (2023-10-16T14:29:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.