Treebanking User-Generated Content: a UD Based Overview of Guidelines,
Corpora and Unified Recommendations
- URL: http://arxiv.org/abs/2011.02063v1
- Date: Tue, 3 Nov 2020 23:34:42 GMT
- Title: Treebanking User-Generated Content: a UD Based Overview of Guidelines,
Corpora and Unified Recommendations
- Authors: Manuela Sanguinetti, Lauren Cassidy, Cristina Bosco, \"Ozlem
\c{C}etino\u{g}lu, Alessandra Teresa Cignarella, Teresa Lynn, Ines Rehbein,
Josef Ruppenhofer, Djam\'e Seddah, Amir Zeldes
- Abstract summary: This article presents a discussion on the main linguistic phenomena which cause difficulties in the analysis of user-generated texts found on the web and in social media.
It proposes a set of tentative UD-based annotation guidelines to promote consistent treatment of the particular phenomena found in these types of texts.
- Score: 58.50167394354305
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This article presents a discussion on the main linguistic phenomena which
cause difficulties in the analysis of user-generated texts found on the web and
in social media, and proposes a set of annotation guidelines for their
treatment within the Universal Dependencies (UD) framework of syntactic
analysis. Given on the one hand the increasing number of treebanks featuring
user-generated content, and its somewhat inconsistent treatment in these
resources on the other, the aim of this article is twofold: (1) to provide a
condensed, though comprehensive, overview of such treebanks -- based on
available literature -- along with their main features and a comparative
analysis of their annotation criteria, and (2) to propose a set of tentative
UD-based annotation guidelines, to promote consistent treatment of the
particular phenomena found in these types of texts. The overarching goal of
this article is to provide a common framework for researchers interested in
developing similar resources in UD, thus promoting cross-linguistic
consistency, which is a principle that has always been central to the spirit of
UD.
Related papers
- Annotator in the Loop: A Case Study of In-Depth Rater Engagement to Create a Bridging Benchmark Dataset [1.825224193230824]
We describe a novel, collaborative, and iterative annotator-in-the-loop methodology for annotation.
Our findings indicate that collaborative engagement with annotators can enhance annotation methods.
arXiv Detail & Related papers (2024-08-01T19:11:08Z) - Augmenting Textual Generation via Topology Aware Retrieval [30.933176170660683]
We develop a Topology-aware Retrieval-augmented Generation framework.
This framework includes a retrieval module that selects texts based on their topological relationships.
We have curated established text-attributed networks and conducted comprehensive experiments to validate the effectiveness of this framework.
arXiv Detail & Related papers (2024-05-27T19:02:18Z) - A Note on an Inferentialist Approach to Resource Semantics [48.65926948745294]
'Inferentialism' is the view that meaning is given in terms of inferential behaviour.
This paper shows how 'inferentialism' enables a versatile and expressive framework for resource semantics.
arXiv Detail & Related papers (2024-05-10T14:13:21Z) - A Literature Review of Literature Reviews in Pattern Analysis and Machine Intelligence [58.6354685593418]
This paper proposes several article-level, field-normalized, and large language model-empowered bibliometric indicators to evaluate reviews.
The newly emerging AI-generated literature reviews are also appraised.
This work offers insights into the current challenges of literature reviews and envisions future directions for their development.
arXiv Detail & Related papers (2024-02-20T11:28:50Z) - A semantically enhanced dual encoder for aspect sentiment triplet
extraction [0.7291396653006809]
Aspect sentiment triplet extraction (ASTE) is a crucial subtask of aspect-based sentiment analysis (ABSA)
Previous research has focused on enhancing ASTE through innovative table-filling strategies.
We propose a framework that leverages both a basic encoder, primarily based on BERT, and a particular encoder comprising a Bi-LSTM network and graph convolutional network (GCN)
Experiments conducted on benchmark datasets demonstrate the state-of-the-art performance of our proposed framework.
arXiv Detail & Related papers (2023-06-14T09:04:14Z) - Transition-based Abstract Meaning Representation Parsing with Contextual
Embeddings [0.0]
We study a way of combing two of the most successful routes to meaning of language--statistical language models and symbolic semantics formalisms--in the task of semantic parsing.
We explore the utility of incorporating pretrained context-aware word embeddings--such as BERT and RoBERTa--in the problem of parsing.
arXiv Detail & Related papers (2022-06-13T15:05:24Z) - Revise and Resubmit: An Intertextual Model of Text-based Collaboration
in Peer Review [52.359007622096684]
Peer review is a key component of the publishing process in most fields of science.
Existing NLP studies focus on the analysis of individual texts.
editorial assistance often requires modeling interactions between pairs of texts.
arXiv Detail & Related papers (2022-04-22T16:39:38Z) - Cross-linguistically Consistent Semantic and Syntactic Annotation of Child-directed Speech [27.657676278734534]
This paper proposes a methodology for constructing such corpora of child directed speech paired with sentential logical forms.
The approach enforces a cross-linguistically consistent representation, building on recent advances in dependency representation and semantic parsing.
arXiv Detail & Related papers (2021-09-22T18:17:06Z) - Author Clustering and Topic Estimation for Short Texts [69.54017251622211]
We propose a novel model that expands on the Latent Dirichlet Allocation by modeling strong dependence among the words in the same document.
We also simultaneously cluster users, removing the need for post-hoc cluster estimation.
Our method performs as well as -- or better -- than traditional approaches to problems arising in short text.
arXiv Detail & Related papers (2021-06-15T20:55:55Z) - Weakly-Supervised Aspect-Based Sentiment Analysis via Joint
Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis.
We learn sentiment, aspect> joint topic embeddings in the word embedding space.
We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.