Corpus and Models for Lemmatisation and POS-tagging of Classical French
Theatre
- URL: http://arxiv.org/abs/2005.07505v2
- Date: Fri, 5 Feb 2021 15:32:28 GMT
- Title: Corpus and Models for Lemmatisation and POS-tagging of Classical French
Theatre
- Authors: Jean-Baptiste Camps, Simon Gabay, Paul Fi\`evre, Thibault Cl\'erice,
Florian Cafiero
- Abstract summary: This paper describes the process of building an annotated corpus and training models for classical French literature.
It was originally developed as a preliminary step to the stylometric analyses presented in Cafiero and Camps.
The use of a recent lemmatiser based on neural networks and a CRF tagger allows to achieve accuracies beyond the current state-of-the art on the in-domain test.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: This paper describes the process of building an annotated corpus and training
models for classical French literature, with a focus on theatre, and
particularly comedies in verse. It was originally developed as a preliminary
step to the stylometric analyses presented in Cafiero and Camps [2019]. The use
of a recent lemmatiser based on neural networks and a CRF tagger allows to
achieve accuracies beyond the current state-of-the art on the in-domain test,
and proves to be robust during out-of-domain tests, i.e.up to 20th c.novels.
Related papers
- ProxyDet: Synthesizing Proxy Novel Classes via Classwise Mixup for
Open-Vocabulary Object Detection [7.122652901894367]
Open-vocabulary object detection (OVOD) aims to recognize novel objects whose categories are not included in the training set.
We present a novel, yet simple technique that helps generalization on the overall distribution of novel classes.
arXiv Detail & Related papers (2023-12-12T13:45:56Z) - PHD: Pixel-Based Language Modeling of Historical Documents [55.75201940642297]
We propose a novel method for generating synthetic scans to resemble real historical documents.
We pre-train our model, PHD, on a combination of synthetic scans and real historical newspapers from the 1700-1900 period.
We successfully apply our model to a historical QA task, highlighting its usefulness in this domain.
arXiv Detail & Related papers (2023-10-22T08:45:48Z) - CALaMo: a Constructionist Assessment of Language Models [0.30458514384586405]
This paper presents a novel framework for evaluating Neural Language Models' linguistic abilities using a constructionist approach.
Not only is the usage-based model in line with the underlying philosophy of neural architectures, but it also allows the linguist to keep meaning as a determinant factor in the analysis.
arXiv Detail & Related papers (2023-02-07T16:56:48Z) - Activating the Discriminability of Novel Classes for Few-shot
Segmentation [48.542627940781095]
We propose to activate the discriminability of novel classes explicitly in both the feature encoding stage and the prediction stage for segmentation.
In the prediction stage for segmentation, we learn an Self-Refined Online Foreground-Background classifier (SROFB), which is able to refine itself using the high-confidence pixels of query image.
arXiv Detail & Related papers (2022-12-02T12:22:36Z) - Automatically Discovering Novel Visual Categories with Self-supervised
Prototype Learning [68.63910949916209]
This paper tackles the problem of novel category discovery (NCD), which aims to discriminate unknown categories in large-scale image collections.
We propose a novel adaptive prototype learning method consisting of two main stages: prototypical representation learning and prototypical self-training.
We conduct extensive experiments on four benchmark datasets and demonstrate the effectiveness and robustness of the proposed method with state-of-the-art performance.
arXiv Detail & Related papers (2022-08-01T16:34:33Z) - Class-incremental Novel Class Discovery [76.35226130521758]
We study the new task of class-incremental Novel Class Discovery (class-iNCD)
We propose a novel approach for class-iNCD which prevents forgetting of past information about the base classes.
Our experiments, conducted on three common benchmarks, demonstrate that our method significantly outperforms state-of-the-art approaches.
arXiv Detail & Related papers (2022-07-18T13:49:27Z) - Demystifying the Base and Novel Performances for Few-shot
Class-incremental Learning [15.762281194023462]
Few-shot class-incremental learning (FSCIL) has addressed challenging real-world scenarios where unseen novel classes continually arrive with few samples.
It is required to develop a model that recognizes the novel classes without forgetting prior knowledge.
It is shown that our straightforward method has comparable performance with the sophisticated state-of-the-art algorithms.
arXiv Detail & Related papers (2022-06-18T00:39:47Z) - Corpus and Models for Lemmatisation and POS-tagging of Old French [0.0]
We present the current results of a long going project providing lemmatisation andPOS models for Old French.
We describe how we broached the difficult question of providing lemmatisation andPOS models for Old French with the help of neural taggers and the progressive constitution of dedicated corpora.
arXiv Detail & Related papers (2021-09-23T15:32:41Z) - Learning to Select: A Fully Attentive Approach for Novel Object
Captioning [48.497478154384105]
Novel object captioning (NOC) has recently emerged as a paradigm to test captioning models on objects which are unseen during the training phase.
We present a novel approach for NOC that learns to select the most relevant objects of an image, regardless of their adherence to the training set.
Our architecture is fully-attentive and end-to-end trainable, also when incorporating constraints.
arXiv Detail & Related papers (2021-06-02T19:11:21Z) - A frame semantics based approach to comparative study of digitized
corpus [0.0]
The paper focuses on the morphologic, syntactic, and semantic annotation process of English-Arabic aligned corpus created from a digitized novels.
The present study argues that differences in motion events conceptualization across languages can be described with frame structure and frame-to-frame relations.
arXiv Detail & Related papers (2020-05-29T22:56:25Z) - The Frankfurt Latin Lexicon: From Morphological Expansion and Word
Embeddings to SemioGraphs [97.8648124629697]
The article argues for a more comprehensive understanding of lemmatization, encompassing classical machine learning as well as intellectual post-corrections and, in particular, human interpretation processes based on graph representations of the underlying lexical resources.
arXiv Detail & Related papers (2020-05-21T17:16:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.