An archaeological Catalog Collection Method Based on Large Vision-Language Models
- URL: http://arxiv.org/abs/2412.20088v1
- Date: Sat, 28 Dec 2024 09:10:41 GMT
- Title: An archaeological Catalog Collection Method Based on Large Vision-Language Models
- Authors: Honglin Pang, Yi Chang, Tianjing Duan, Xi Yang,
- Abstract summary: Archaeological catalogs, containing key elements such as artifact images, morphological descriptions, and excavation information, are essential for studying artifact evolution and cultural inheritance.
Existing Large Vision-Language Models and their derivative data collection methods face challenges in accurate image detection and modal matching.
We propose a novel archaeological catalog collection method based on Large Vision-Language Models that follows an approach comprising three modules: document localization, block comprehension and block matching.
- Score: 9.177297031425859
- License:
- Abstract: Archaeological catalogs, containing key elements such as artifact images, morphological descriptions, and excavation information, are essential for studying artifact evolution and cultural inheritance. These data are widely scattered across publications, requiring automated collection methods. However, existing Large Vision-Language Models (VLMs) and their derivative data collection methods face challenges in accurate image detection and modal matching when processing archaeological catalogs, making automated collection difficult. To address these issues, we propose a novel archaeological catalog collection method based on Large Vision-Language Models that follows an approach comprising three modules: document localization, block comprehension and block matching. Through practical data collection from the Dabagou and Miaozigou pottery catalogs and comparison experiments, we demonstrate the effectiveness of our approach, providing a reliable solution for automated collection of archaeological catalogs.
Related papers
- PyPotteryInk: One-Step Diffusion Model for Sketch to Publication-ready Archaeological Drawings [0.0]
PyPotteryInk is an automated pipeline that transforms archaeological pottery sketches into publication-ready inked drawings.
I demonstrate the effectiveness of the approach on a dataset of Italian protohistoric pottery drawings.
The model can be fine-tuned to adapt to different archaeological contexts with minimal training data.
arXiv Detail & Related papers (2025-02-09T14:03:37Z) - A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing
Objects in 3D Scenes [80.20670062509723]
3D dense captioning is an emerging vision-language bridging task that aims to generate detailed descriptions for 3D scenes.
It presents significant potential and challenges due to its closer representation of the real world compared to 2D visual captioning.
Despite the popularity and success of existing methods, there is a lack of comprehensive surveys summarizing the advancements in this field.
arXiv Detail & Related papers (2024-03-12T10:04:08Z) - Leveraging Collection-Wide Similarities for Unsupervised Document Structure Extraction [61.998789448260005]
We propose to identify the typical structure of document within a collection.
We abstract over arbitrary header paraphrases, and ground each topic to respective document locations.
We develop an unsupervised graph-based method which leverages both inter- and intra-document similarities.
arXiv Detail & Related papers (2024-02-21T16:22:21Z) - Contextualization Distillation from Large Language Model for Knowledge
Graph Completion [51.126166442122546]
We introduce the Contextualization Distillation strategy, a plug-in-and-play approach compatible with both discriminative and generative KGC frameworks.
Our method begins by instructing large language models to transform compact, structural triplets into context-rich segments.
Comprehensive evaluations across diverse datasets and KGC techniques highlight the efficacy and adaptability of our approach.
arXiv Detail & Related papers (2024-01-28T08:56:49Z) - Knowledge-Aware Artifact Image Synthesis with LLM-Enhanced Prompting and
Multi-Source Supervision [5.517240672957627]
We propose a novel knowledge-aware artifact image synthesis approach that brings lost historical objects accurately into their visual forms.
Compared to existing approaches, our proposed model produces higher-quality artifact images that align better with the implicit details and historical knowledge contained within written documents.
arXiv Detail & Related papers (2023-12-13T11:03:07Z) - AutArch: An AI-assisted workflow for object detection and automated
recording in archaeological catalogues [37.69303106863453]
This paper introduces a new workflow for collecting data from archaeological find catalogues available as legacy resources.
The workflow relies on custom software (AutArch) supporting image processing, object detection, and interactive means of validating and adjusting automatically retrieved data.
We integrate artificial intelligence (AI) in terms of neural networks for object detection and classification into the workflow.
arXiv Detail & Related papers (2023-11-29T17:24:04Z) - PHD: Pixel-Based Language Modeling of Historical Documents [55.75201940642297]
We propose a novel method for generating synthetic scans to resemble real historical documents.
We pre-train our model, PHD, on a combination of synthetic scans and real historical newspapers from the 1700-1900 period.
We successfully apply our model to a historical QA task, highlighting its usefulness in this domain.
arXiv Detail & Related papers (2023-10-22T08:45:48Z) - Enhancing Object Detection in Ancient Documents with Synthetic Data
Generation and Transformer-Based Models [0.4125187280299248]
This research aims to enhance object detection in ancient documents by reducing false positives and improving precision.
We propose a method that involves the creation of synthetic datasets through computational mediation.
Our approach includes associating objects with their component parts and introducing a visual feature map to enable the model to discern between different symbols and document elements.
arXiv Detail & Related papers (2023-07-29T15:29:25Z) - Unsupervised Clustering of Roman Potsherds via Variational Autoencoders [63.8376359764052]
We propose an artificial intelligence solution to support archaeologists in the classification task of Roman commonware potsherds.
The partiality and handcrafted variance of the fragments make their matching a challenging problem.
We propose to pair similar profiles via the unsupervised hierarchical clustering of non-linear features learned in the latent space of a deep convolutional Variational Autoencoder (VAE) network.
arXiv Detail & Related papers (2022-03-14T18:56:13Z) - Generating More Pertinent Captions by Leveraging Semantics and Style on
Multi-Source Datasets [56.018551958004814]
This paper addresses the task of generating fluent descriptions by training on a non-uniform combination of data sources.
Large-scale datasets with noisy image-text pairs provide a sub-optimal source of supervision.
We propose to leverage and separate semantics and descriptive style through the incorporation of a style token and keywords extracted through a retrieval component.
arXiv Detail & Related papers (2021-11-24T19:00:05Z) - Object Retrieval and Localization in Large Art Collections using Deep
Multi-Style Feature Fusion and Iterative Voting [10.807131260367298]
We introduce an algorithm that allows users to search for image regions containing specific motifs or objects.
Our region-based voting with GPU-accelerated approximate nearest-neighbour search allows us to find and localize even small motifs within an extensive dataset in a few seconds.
arXiv Detail & Related papers (2021-07-14T18:40:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.