Related papers: Instruct-Tuning Pretrained Causal Language Models for Ancient Greek Papyrology and Epigraphy

Instruct-Tuning Pretrained Causal Language Models for Ancient Greek Papyrology and Epigraphy

URL: http://arxiv.org/abs/2409.13870v3
Date: Sun, 17 Nov 2024 21:28:01 GMT
Title: Instruct-Tuning Pretrained Causal Language Models for Ancient Greek Papyrology and Epigraphy
Authors: Eric Cullhed,
Abstract summary: This article presents an experiment in fine-tuning a pretrained causal language model to restore missing or illegible characters in ancient Greek inscriptions and documentary papyri. Benchmarked against the state-of-the-art model (Ithaca), the instruction-tuned models excelled in text restoration. Results suggest that fine-tuning larger pretrained causal language models using instruction templates for emendations and conjectures holds promise.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This article presents an experiment in fine-tuning a pretrained causal language model (Meta's Llama 3.1 8B Instruct) to assist with restoring missing or illegible characters in ancient Greek inscriptions and documentary papyri. Utilizing a straightforward instruction-based approach and a 95%/5% train/test split, the papyrus restoration model achieved a character error rate (CER) of 14.9%, a top-1 accuracy of 73.5%, and a top-20 accuracy of 86.0% for sequences up to 10 characters. A model was also fine-tuned for geographic attribution, reaching a top-1 accuracy of 66.4% and a top-3 accuracy of 79.9%. In chronological attribution, it demonstrated an average deviation of 21.7 years from the actual terminus post/ante quem, with a median deviation of 0 years. For inscriptions, the restoration model achieved a CER of 20.5%, a top-1 accuracy of 63.7%, and a top-20 accuracy of 83.0% for sequences up to 10 characters. In geographic attribution, it attained a top-1 accuracy of 75.0% and a top-3 accuracy of 83.7%, while in dating, it had an average deviation of 37.1 years and a median deviation of 3 years from the actual date range. Benchmarked against the state-of-the-art model (Ithaca) on a shared test set and on recently edited inscriptions, the instruction-tuned models excelled in text restoration, while also offering the practical advantage of ignoring spaces during reconstruction, which aligns with the scriptio continua of ancient textual artifacts. However, their performance in geographic and chronological attribution was lower than Ithaca's. To evaluate the approach in a more even setup, the instruction model was retrained with an 80%/10%/10% train-validation-test split, and still outperformed Ithaca in text restoration. The results suggest that fine-tuning larger pretrained causal language models using instruction templates for emendations and conjectures to ancient texts holds promise.

Related papers

Explainable Admission-Level Predictive Modeling for Prolonged Hospital Stay in Elderly Populations: Challenges in Low- and Middle-Income Countries [65.4286079244589]
Prolonged length of stay (pLoS) is a significant factor associated with the risk of adverse in-hospital events.<n>We develop and explain a predictive model for pLos using admission-level patient and hospital administrative data.
arXiv Detail & Related papers (2026-01-07T23:35:24Z)
Decoding the Past: Explainable Machine Learning Models for Dating Historical Texts [0.08749675983608168]
This article addresses temporal text classification using interpretable, feature-engineered tree-based machine learning models.<n>We integrate five feature categories - compression-based, lexical structure, readability, neologism detection, and distance features - to predict the temporal origin of English texts spanning five centuries.<n>On a large-scale corpus, we achieve 76.7% accuracy for century-scale prediction and 26.1% for decade-scale classification, substantially above random baselines.
arXiv Detail & Related papers (2025-11-28T10:27:48Z)
Beyond Correctness: Evaluating Subjective Writing Preferences Across Cultures [87.75098311090642]
Current preference learning methods achieve high accuracy on standard benchmarks but exhibit significant performance degradation when objective quality signals are removed.<n>We introduce WritingPreferenceBench, a dataset of 1,800 human-annotated preference pairs (1,200 English, 600 Chinese) across 8 creative writing genres.
arXiv Detail & Related papers (2025-10-16T12:23:13Z)
Optimized Text Embedding Models and Benchmarks for Amharic Passage Retrieval [49.1574468325115]
We introduce Amharic-specific dense retrieval models based on pre-trained Amharic BERT and RoBERTa backbones.<n>Our proposed RoBERTa-Base-Amharic-Embed model (110M parameters) achieves a 17.6% relative improvement in MRR@10.<n>More compact variants, such as RoBERTa-Medium-Amharic-Embed (42M) remain competitive while being over 13x smaller.
arXiv Detail & Related papers (2025-05-25T23:06:20Z)
A Comprehensive Study on Fine-Tuning Large Language Models for Medical Question Answering Using Classification Models and Comparative Analysis [0.0]
We are improving the accuracy and efficiency of providing reliable answers to medical questions. Various models such as RoBERTa and BERT were examined and evaluated based on their ability.
arXiv Detail & Related papers (2025-01-27T03:31:02Z)
Grammatical Error Correction for Low-Resource Languages: The Case of Zarma [8.057796934109938]
Grammatical error correction (GEC) is important for improving written materials for low-resource languages like Zarma. This study compares rule-based methods, machine translation (MT) models, and large language models (LLMs) for GEC in Zarma.
arXiv Detail & Related papers (2024-10-20T23:51:36Z)
Text Sentiment Analysis and Classification Based on Bidirectional Gated Recurrent Units (GRUs) Model [6.096738978232722]
This paper explores the importance of text sentiment analysis and classification in the field of natural language processing. It proposes a new approach to sentiment analysis and classification based on the bidirectional gated recurrent units (GRUs) model.
arXiv Detail & Related papers (2024-04-26T02:40:03Z)
Common 7B Language Models Already Possess Strong Math Capabilities [61.61442513067561]
This paper shows that the LLaMA-2 7B model with common pre-training already exhibits strong mathematical abilities. The potential for extensive scaling is constrained by the scarcity of publicly available math questions.
arXiv Detail & Related papers (2024-03-07T18:00:40Z)
Assessing the Efficacy of Grammar Error Correction: A Human Evaluation Approach in the Japanese Context [10.047123247001714]
We evaluate the performance of the state-of-the-art sequence tagging grammar error detection and correction model (SeqTagger) With an automatic annotation toolkit, ERRANT, we first evaluated SeqTagger's performance on error correction with human expert correction as the benchmark. Results indicated a precision of 63.66% and a recall of 20.19% for error correction in the full dataset.
arXiv Detail & Related papers (2024-02-28T06:43:43Z)
Pre-training and Diagnosing Knowledge Base Completion Models [58.07183284468881]
We introduce and analyze an approach to knowledge transfer from one collection of facts to another without the need for entity or relation matching. The main contribution is a method that can make use of large-scale pre-training on facts, which were collected from unstructured text. To understand the obtained pre-trained models better, we then introduce a novel dataset for the analysis of pre-trained models for Open Knowledge Base Completion.
arXiv Detail & Related papers (2024-01-27T15:20:43Z)
Regularization methods for the short-term forecasting of the Italian electric load [77.34726150561087]
The problem of forecasting the whole 24 profile of the Italian electric load is addressed as a multitask learning problem. The 96x96 matrix weights form a 96x96 matrix, that can be seen and displayed as a surface sampled on a square domain. Different regularization and sparsity approaches to reduce the degrees of freedom of the surface were explored, comparing the obtained forecasts with those of the Italian Transmission System Operator Terna.
arXiv Detail & Related papers (2021-12-08T22:15:06Z)
FPM: A Collection of Large-scale Foundation Pre-trained Language Models [0.0]
We use the current effective model structure to launch a model set through the current most mainstream technology. We think this will become the basic model in the future.
arXiv Detail & Related papers (2021-11-09T02:17:15Z)
VSEC: Transformer-based Model for Vietnamese Spelling Correction [0.19116784879310028]
We propose a novel method to correct Vietnamese spelling errors. We tackle the problems of mistyped errors and misspelled errors by using a deep learning model. The experimental results show that our method achieves encouraging performance with 86.8% errors detected and 81.5% errors corrected.
arXiv Detail & Related papers (2021-11-01T00:55:32Z)
Calibrate Before Use: Improving Few-Shot Performance of Language Models [68.17016463756474]
GPT-3 can perform numerous tasks when provided a natural language prompt that contains a few training examples. We show that this type of few-shot learning can be unstable. The choice of prompt format, training examples, and even the order of the training examples can cause accuracy to vary from near chance to near state-of-the-art.
arXiv Detail & Related papers (2021-02-19T00:23:59Z)
DeBERTa: Decoding-enhanced BERT with Disentangled Attention [119.77305080520718]
We propose a new model architecture DeBERTa that improves the BERT and RoBERTa models using two novel techniques. We show that these techniques significantly improve the efficiency of model pre-training and the performance of both natural language understanding (NLU) and natural langauge generation (NLG) downstream tasks.
arXiv Detail & Related papers (2020-06-05T19:54:34Z)
Semi-Supervised Neural Architecture Search [185.0651567642238]
SemiNAS is a semi-supervised Neural architecture search (NAS) approach that leverages numerous unlabeled architectures (without evaluation and thus nearly no cost) It achieves 94.02% test accuracy on NASBench-101, outperforming all the baselines when using the same number of architectures. It achieves 97% intelligibility rate in the low-resource setting and 15% test error rate in the robustness setting, with 9%, 7% improvements over the baseline respectively.
arXiv Detail & Related papers (2020-02-24T17:23:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.