Quotations, Coreference Resolution, and Sentiment Annotations in
Croatian News Articles: An Exploratory Study
- URL: http://arxiv.org/abs/2212.07172v1
- Date: Wed, 14 Dec 2022 11:54:12 GMT
- Title: Quotations, Coreference Resolution, and Sentiment Annotations in
Croatian News Articles: An Exploratory Study
- Authors: Jelena Sarajli\'c, Gaurish Thakkar, Diego Alves, Nives Mikelic
Preradovi\'c
- Abstract summary: The paper focuses on the annotation of the quotation, co-reference resolution, and sentiment annotation in SETimes news corpus in Croatian.
The generated corpus with quotation features annotations can be used for multiple tasks in the field of Natural Language Processing.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This paper presents a corpus annotated for the task of direct-speech
extraction in Croatian. The paper focuses on the annotation of the quotation,
co-reference resolution, and sentiment annotation in SETimes news corpus in
Croatian and on the analysis of its language-specific differences compared to
English. From this, a list of the phenomena that require special attention when
performing these annotations is derived. The generated corpus with quotation
features annotations can be used for multiple tasks in the field of Natural
Language Processing.
Related papers
- Understanding Cross-Lingual Alignment -- A Survey [52.572071017877704]
Cross-lingual alignment is the meaningful similarity of representations across languages in multilingual language models.
We survey the literature of techniques to improve cross-lingual alignment, providing a taxonomy of methods and summarising insights from throughout the field.
arXiv Detail & Related papers (2024-04-09T11:39:53Z) - FRACAS: A FRench Annotated Corpus of Attribution relations in newS [0.0]
We present a manually annotated corpus of 1676 newswire texts in French for quotation extraction and source attribution.
We first describe the composition of our corpus and the choices that were made in selecting the data.
We then detail our inter-annotator agreement between the 8 annotators who worked on manual labelling.
arXiv Detail & Related papers (2023-09-19T13:19:54Z) - A Corpus for Sentence-level Subjectivity Detection on English News Articles [49.49218203204942]
We use our guidelines to collect NewsSD-ENG, a corpus of 638 objective and 411 subjective sentences extracted from English news articles on controversial topics.
Our corpus paves the way for subjectivity detection in English without relying on language-specific tools, such as lexicons or machine translation.
arXiv Detail & Related papers (2023-05-29T11:54:50Z) - Design Choices for Crowdsourcing Implicit Discourse Relations: Revealing
the Biases Introduced by Task Design [23.632204469647526]
We show that the task design can push annotators towards certain relations.
We conclude that this type of bias should be taken into account when training and testing models.
arXiv Detail & Related papers (2023-04-03T09:04:18Z) - Zero-shot Cross-Linguistic Learning of Event Semantics [27.997873309702225]
We look at captions of images across Arabic, Chinese, Farsi, German, Russian, and Turkish.
We show that lexical aspects can be predicted for a given language despite not having observed any annotated data for this language at all.
arXiv Detail & Related papers (2022-07-05T23:18:36Z) - Models and Datasets for Cross-Lingual Summarisation [78.56238251185214]
We present a cross-lingual summarisation corpus with long documents in a source language associated with multi-sentence summaries in a target language.
The corpus covers twelve language pairs and directions for four European languages, namely Czech, English, French and German.
We derive cross-lingual document-summary instances from Wikipedia by combining lead paragraphs and articles' bodies from language aligned Wikipedia titles.
arXiv Detail & Related papers (2022-02-19T11:55:40Z) - Annotation Curricula to Implicitly Train Non-Expert Annotators [56.67768938052715]
voluntary studies often require annotators to familiarize themselves with the task, its annotation scheme, and the data domain.
This can be overwhelming in the beginning, mentally taxing, and induce errors into the resulting annotations.
We propose annotation curricula, a novel approach to implicitly train annotators.
arXiv Detail & Related papers (2021-06-04T09:48:28Z) - The Discussion Tracker Corpus of Collaborative Argumentation [2.800857580710507]
The Discussion Tracker corpus was collected in American high school English classes.
The corpus consists of 29 multi-party discussions of English literature transcribed from 985 minutes of audio.
arXiv Detail & Related papers (2020-05-22T18:27:28Z) - A Corpus Study and Annotation Schema for Named Entity Recognition and
Relation Extraction of Business Products [68.26059718611914]
We present a corpus study, an annotation schema and associated guidelines, for the annotation of product entity and company-product relation mentions.
We find that although product mentions are often realized as noun phrases, defining their exact extent is difficult due to high boundary ambiguity.
We present a preliminary corpus of English web and social media documents annotated according to the proposed guidelines.
arXiv Detail & Related papers (2020-04-07T11:45:22Z) - On the Importance of Word Order Information in Cross-lingual Sequence
Labeling [80.65425412067464]
Cross-lingual models that fit into the word order of the source language might fail to handle target languages.
We investigate whether making models insensitive to the word order of the source language can improve the adaptation performance in target languages.
arXiv Detail & Related papers (2020-01-30T03:35:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.