Related papers: Corpus and Models for Lemmatisation and POS-tagging of Classical French Theatre

Corpus and Models for Lemmatisation and POS-tagging of Classical French Theatre

URL: http://arxiv.org/abs/2005.07505v2
Date: Fri, 5 Feb 2021 15:32:28 GMT
Title: Corpus and Models for Lemmatisation and POS-tagging of Classical French Theatre
Authors: Jean-Baptiste Camps, Simon Gabay, Paul Fi\`evre, Thibault Cl\'erice, Florian Cafiero
Abstract summary: This paper describes the process of building an annotated corpus and training models for classical French literature. It was originally developed as a preliminary step to the stylometric analyses presented in Cafiero and Camps. The use of a recent lemmatiser based on neural networks and a CRF tagger allows to achieve accuracies beyond the current state-of-the art on the in-domain test.
Score: 0.0
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: This paper describes the process of building an annotated corpus and training models for classical French literature, with a focus on theatre, and particularly comedies in verse. It was originally developed as a preliminary step to the stylometric analyses presented in Cafiero and Camps [2019]. The use of a recent lemmatiser based on neural networks and a CRF tagger allows to achieve accuracies beyond the current state-of-the art on the in-domain test, and proves to be robust during out-of-domain tests, i.e.up to 20th c.novels.

Related papers

Token and Span Classification for Entity Recognition in French Historical Encyclopedias [0.0]
Named Entity Recognition (NER) in historical texts presents unique challenges due to non-standardized language, archaic orthography, and nested or overlapping entities.<n>This study benchmarks a diverse set of NER approaches, ranging from classical Conditional Random Fields (CRFs) and spaCy-based models to transformer-based architectures.<n>Experiments are conducted on the GeoEDdA dataset, a richly annotated corpus derived from 18th-century French encyclopedias.
arXiv Detail & Related papers (2025-06-03T13:37:44Z)
ProxyDet: Synthesizing Proxy Novel Classes via Classwise Mixup for Open-Vocabulary Object Detection [7.122652901894367]
Open-vocabulary object detection (OVOD) aims to recognize novel objects whose categories are not included in the training set. We present a novel, yet simple technique that helps generalization on the overall distribution of novel classes.
arXiv Detail & Related papers (2023-12-12T13:45:56Z)
PHD: Pixel-Based Language Modeling of Historical Documents [55.75201940642297]
We propose a novel method for generating synthetic scans to resemble real historical documents. We pre-train our model, PHD, on a combination of synthetic scans and real historical newspapers from the 1700-1900 period. We successfully apply our model to a historical QA task, highlighting its usefulness in this domain.
arXiv Detail & Related papers (2023-10-22T08:45:48Z)
CALaMo: a Constructionist Assessment of Language Models [0.30458514384586405]
This paper presents a novel framework for evaluating Neural Language Models' linguistic abilities using a constructionist approach. Not only is the usage-based model in line with the underlying philosophy of neural architectures, but it also allows the linguist to keep meaning as a determinant factor in the analysis.
arXiv Detail & Related papers (2023-02-07T16:56:48Z)
Activating the Discriminability of Novel Classes for Few-shot Segmentation [48.542627940781095]
We propose to activate the discriminability of novel classes explicitly in both the feature encoding stage and the prediction stage for segmentation. In the prediction stage for segmentation, we learn an Self-Refined Online Foreground-Background classifier (SROFB), which is able to refine itself using the high-confidence pixels of query image.
arXiv Detail & Related papers (2022-12-02T12:22:36Z)
Automatically Discovering Novel Visual Categories with Self-supervised Prototype Learning [68.63910949916209]
This paper tackles the problem of novel category discovery (NCD), which aims to discriminate unknown categories in large-scale image collections. We propose a novel adaptive prototype learning method consisting of two main stages: prototypical representation learning and prototypical self-training. We conduct extensive experiments on four benchmark datasets and demonstrate the effectiveness and robustness of the proposed method with state-of-the-art performance.
arXiv Detail & Related papers (2022-08-01T16:34:33Z)
Class-incremental Novel Class Discovery [76.35226130521758]
We study the new task of class-incremental Novel Class Discovery (class-iNCD) We propose a novel approach for class-iNCD which prevents forgetting of past information about the base classes. Our experiments, conducted on three common benchmarks, demonstrate that our method significantly outperforms state-of-the-art approaches.
arXiv Detail & Related papers (2022-07-18T13:49:27Z)
Demystifying the Base and Novel Performances for Few-shot Class-incremental Learning [15.762281194023462]
Few-shot class-incremental learning (FSCIL) has addressed challenging real-world scenarios where unseen novel classes continually arrive with few samples. It is required to develop a model that recognizes the novel classes without forgetting prior knowledge. It is shown that our straightforward method has comparable performance with the sophisticated state-of-the-art algorithms.
arXiv Detail & Related papers (2022-06-18T00:39:47Z)
Corpus and Models for Lemmatisation and POS-tagging of Old French [0.0]
We present the current results of a long going project providing lemmatisation andPOS models for Old French. We describe how we broached the difficult question of providing lemmatisation andPOS models for Old French with the help of neural taggers and the progressive constitution of dedicated corpora.
arXiv Detail & Related papers (2021-09-23T15:32:41Z)
Learning to Select: A Fully Attentive Approach for Novel Object Captioning [48.497478154384105]
Novel object captioning (NOC) has recently emerged as a paradigm to test captioning models on objects which are unseen during the training phase. We present a novel approach for NOC that learns to select the most relevant objects of an image, regardless of their adherence to the training set. Our architecture is fully-attentive and end-to-end trainable, also when incorporating constraints.
arXiv Detail & Related papers (2021-06-02T19:11:21Z)
A frame semantics based approach to comparative study of digitized corpus [0.0]
The paper focuses on the morphologic, syntactic, and semantic annotation process of English-Arabic aligned corpus created from a digitized novels. The present study argues that differences in motion events conceptualization across languages can be described with frame structure and frame-to-frame relations.
arXiv Detail & Related papers (2020-05-29T22:56:25Z)
The Frankfurt Latin Lexicon: From Morphological Expansion and Word Embeddings to SemioGraphs [97.8648124629697]
The article argues for a more comprehensive understanding of lemmatization, encompassing classical machine learning as well as intellectual post-corrections and, in particular, human interpretation processes based on graph representations of the underlying lexical resources.
arXiv Detail & Related papers (2020-05-21T17:16:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.